Abstract
This paper addresses a relevant problem in Forensic Sciences by integrating radiological techniques with advanced machine learning methodologies to create a non-invasive, efficient, and less examiner-dependent approach to age estimation. Our study includes a new dataset of 12,827 dental panoramic X-ray images representing the Brazilian population, covering an age range from 2.25 to 96.50 years. To analyze these exams, we employed a model adapted from InceptionV4, enhanced with data augmentation techniques. The proposed approach achieved robust and reliable results, with a Test Mean Absolute Error of 3.1 years and an R-squared value of 95.5%. Professional radiologists have validated that our model focuses on critical features for age assessment used in odontology, such as pulp chamber dimensions and stages of permanent teeth calcification. Importantly, the model also relies on anatomical information from the mandible, maxillary sinus, and vertebrae, which enables it to perform well even in edentulous cases. This study demonstrates the significant potential of machine learning to revolutionize age estimation in Forensic Science, offering a more accurate, efficient, and universally applicable solution.
Similar content being viewed by others
Introduction
The task of age estimation plays a pivotal role in forensic sciences and civil investigations, aiding in the reconstruction of biological profiles for missing-person cases, confirming the age of younger criminals, and assisting in situations where personal documents are unavailable. This process traditionally relies on morphological, biochemical, and radiological methods, with radiological approaches, particularly panoramic radiography, emerging as the preferred method due to its non-invasiveness, simplicity, and cost-effectiveness. Panoramic radiography allows for assessing dental development stages across all teeth simultaneously, offering a vital tool for age estimation1,2,3,4,5,6,7,8,9,10,11,12,13,14,15.
However, despite advances in radiological techniques, age estimation poses significant challenges, particularly in older individuals. Once dental development is completed, typically by the age of 24 with the closure of the third molar’s apex, traditional manual and visual assessment methods become less effective, creating a gap in accurately determining age in later stages of life2,4. Methods like the pulp/tooth area ratio calculation have been explored to address aging in older individuals, focusing on the deposition of secondary dentin16,17,18. However, these methods also face limitations, including introducing bias from examiner subjectivity and decreased effectiveness after age 2419,20. Additionally, the formation of reparative (tertiary) dentin, produced by odontoblasts as a defense mechanism against caries progression, further complicates age assessment. This process can appear radiographically similar to normal dental aging, leading to potential confusion. Such similarities are especially problematic when rehabilitated teeth are included in the sample, as they can obscure accurate age determination and introduce additional subjectivity21,22.
The integration of Artificial Intelligence (AI) technology, particularly Deep Neural Networks (DNNs), offers a promising solution to overcome the limitations of traditional age estimation methods. Recent studies have explored the potential of neural networks in automating the evaluation of dental development stages, demonstrating comparable accuracy to human observers and suggesting an avenue for enhancing chronological age detection23,24,25,26,27,28,29,30,31,32,33,34. By developing AI-powered solutions that analyze full panoramic radiograph images without the need for prior manual evaluation, it is possible to achieve faster, more efficient, and more accurate age estimation. This approach not only alleviates the reliance on specialist manual evaluations but also addresses the challenge of estimating age in older individuals, marking a significant advancement in forensic dentistry and anthropology.
While traditional age estimation methods have provided valuable insights, their limitations in assessing older individuals highlight the need for innovative solutions. The application of machine learning models in dental radiography represents a transformative step forward, offering a more reliable and efficient means of age estimation that can more effectively support forensic and civil investigations.
We propose a structured exploration of deep learning age estimation using panoramic dental radiographs in order to build a reliable workflow, depicted in Fig. 1. Initially, we delve into the Results section, which presents the empirical outcomes of applying our InceptionV4-based approach to a unique Brazilian subject dataset, covering age ranges from juveniles to adults. Building upon these results, the Discussion offers a deeper analysis, situating our findings within the broader context of forensic science and benchmarking against expert evaluations, thereby illustrating our research’s real-world applicability and implications. The Experiments section details the methodologies employed, from data collection and preprocessing to the intricacies of model training and validation. The Related Work section provides a literature review, highlighting previous studies, positioning our research within the existing works, and highlighting the potential of our contribution to forensic dentistry and future advancements.
Results
Our baseline experiment was developed with an adaptation of the InceptionV4 Network with no data augmentation process. The training procedure used 80% of the subjects for training and an additional 10% subjects as validation. We achieved a Validation Mean Absolute Error of 3.83 ± 0.224 and a Mean Squared Error of 27.83± 0.326, indicating that the architecture could learn from the exams and predict chronological age after the training process.
We also evaluated a 10% holdout set composed of 1004 exams. To fully appreciate the nuances of our predictive model’s performance, we should consider various metrics extending beyond the conventionally used Mean Absolute Error (MAE) and Mean Squared Error (MSE). While the MAE of 3.88± 0.231 and an MSE of 26.47± 0.333 certainly offer valuable insights into the average model error, such as consistency with the validation set, there is an additional richness of information to be gleaned from other metrics that help capture the variability of prediction errors.
We assessed the median absolute error, which is 2.78 years. This measure serves as a helpful indicator of the typical error you might expect, being less sensitive to extreme values than the mean. This measure also indicates that half of our predictions have an absolute prediction error inferior to it.
Another important measure is the Interquartile Range (IQR) of the Absolute Error, calculated in 4.69 years. It offers a robust measure of the spread of our prediction errors. This metric is particularly useful because it represents the range within which the middle 50% of our forecast errors fall, providing us with a comprehensive picture of the variability in our predictions while remaining less prone to distortion by potential outliers.
We computed the R-squared determination Coefficient to assess the extent to which age variance is explained by our features extracted from the PRs. Our \(R^2\) scored a value of 93%, which indicates that a substantial proportion of the variance in age can be explained by the features extracted from the exams.
Lastly, we plotted the Bland-Altman graphic analysis, also known as a difference plot. Upon examining Fig. 2, we can identify that our model presents almost no systematic bias in error distribution, as the values predominantly cluster around – 0.09 on the y-axis, and our confidence intervals are symmetric. The confidence intervals in the Bland-Altman plot are calculated using the formula IC = \(\text {mean difference} \pm 1.96 \times \text {standard deviation of the differences}\). This formula is used because it provides the range within which 95% of the differences between predicted and actual values are expected to fall, assuming a normal distribution of errors. Based on the results from the t-test (p-value: 0.57), there is no statistical evidence to suggest that the predictions from our model are significantly different from the actual ages. Still, the uneven distribution of points along the x-axis confirms the existence of random errors.
Moreover, the Bland-Altman plot reveals a potential relationship between age and the precision of the model’s predictions. The observed cone-beam spread pattern suggests that the difference between actual and predicted values enlarges as age increases. This pattern may indicate that our model’s predictive accuracy decreases with higher age values.
This might occur for two reasons: dataset imbalance problems and natural aging complexity gain. To verify if this hypothesis is correct, we conducted a Pearson correlation test to examine whether the training frequency of images by age group was correlated with the model’s performance.
As Fig. 3 indicates, frequency and the model’s MAE have a strong negative correlation. This finding suggests that addressing the imbalance in our dataset could potentially enhance the overall performance of the age estimation model.
It is noteworthy that if we examine younger age groups, we can observe a positive coefficient, and the MAE tends to increase with age. For patients aged 0 to 19 years, we observe a Pearson correlation coefficient of 0.57. The coefficient is also strongly positive for the age group 20 to 39 years, 0.63. This apparent instance of Simpson’s paradox might suggest that the model’s performance could be affected by the imbalance in our data and the complexity inherent to aging.
We performed several data augmentation tests to tackle this issue, including balancing augmentations focusing on less frequent age groups. As described in the Augmentation Tuning subsection, our best result was observed by tripling the dataset through data augmentation. This means that each original image was synthetically augmented into three new images. Our new model achieved a validation MAE of 3.13± 0.19 and an MSE of 19± 0.27 and showed a shorter gap between training and validation errors during training.
In the holdout test, the augmented model demonstrated enhanced performance. It achieved an MAE of 3.1± 0.18 years, an MSE of 18.46± 0.27 years, a Median Absolute Error of 2.16 years, an IQR of 3.55 years, and a higher R-squared coefficient, indicating an overall improvement in prediction precision and a significant improvement in variability, as shown in Table 1.
The observed decrease in the MAE and MSE metrics suggests that incorporating new data through augmentation techniques may have allowed the model to learn more robust feature representations. Given that the model is making fewer significant errors in predictions, it is indicative that the model enhanced with augmentation is better at managing outliers and challenging cases; this might be important, especially for legal age confirmation uses.
Simultaneously, reducing Median and Interquartile Range (IQR) metrics indicates that our augmented model delivers more accurate predictions and consistent results. This suggests an increased level of reliability and stability in our model, an assertion that can be further substantiated by examining our error distribution in the upcoming analysis.
In the Bland-Altman graphical analysis for the predictions from the augmented model, shown in Fig. 4, we can observe a minor systematic bias in the prediction error distribution. This bias is indicated by the predominant clustering of values around 0.25 on the y-axis, and our confidence intervals exhibit a slight skew toward positive errors. This suggests that our model slightly underestimates the patients’ age. However, based on the results from the t-test (p-value = 0.06), there is no solid statistical evidence to suggest that the predictions from our augmented model are significantly different from the actual ages.
The cone-beam spread pattern observed in the base model is also present in the augmented model. However, a notable reduction in the confidence interval’s width points to less prediction variability. This reduction indicates that our data augmentation strategy has successfully addressed some of the variability issues identified earlier.
The analysis of the Pearson correlation coefficient shown in Table 2 suggests that the augmentation strategy has had mixed effects on the performance of our model across different age groups. While it appears to have mitigated the impact of aging complexity, as seen in the 40–69 and 70+ age groups, it has significantly increased the imbalance problem for the 0–19 age group, led by a strong bias on the 0–5 age group.
To further analyze the performance of our model and assess potential biases, we examined the results based on sex and age groups. This evaluation is crucial to ensure that our model performs equitably across different demographic segments and to identify areas where performance disparities may exist. The following subsections detail these findings.
Results by biological sex
We analyzed the prediction errors separately for male and female subsets to investigate possible sex-related biases in our model, as shown in Fig. 5. For the female subgroup, the model achieved a Mean Absolute Error (MAE) of 3.01 ± 0.06 years and a Mean Squared Error (MSE) of 17.10 years. The Median Absolute Error was 2.22 years, with the 25th and 75th percentiles of Absolute Error at 0.84 and 4.30 years, respectively, resulting in an Interquartile Range (IQR) of 3.46 years. The R-squared (\(R^2\)) and Explained Variance both scored 0.958, indicating strong predictive power.
Similarly, the model achieved an MAE of 3.09 ± 0.06 years for the male subset and an MSE of 19.00 years. The Median Absolute Error was 2.08 years, with the 25th and 75th percentiles of Absolute Error at 0.82 and 4.30 years, respectively, resulting in an IQR of 3.47 years. The R-squared (\(R^2\)) and Explained Variance both scored 0.953, also indicating predictive solid power.
These results suggest that the model performs comparably for both sexes, with no substantial bias favoring one. The consistency in performance metrics across male and female subsets indicates that the model is robust and unbiased regarding sex-related predictions, even with the unbalanced dataset.
Results by age group
Recognizing that the accuracy of age estimation can vary significantly across different age ranges, we evaluated the model’s performance separately for individuals aged 24 and under and those over 24. For the younger age group, the model achieved a Mean Absolute Error (MAE) of 1.18 ± 0.02 years and a Mean Squared Error (MSE) of 2.74 years. The Median Absolute Error was 0.87 years, with the 25th and 75th percentiles of Absolute Error at 0.33 and 1.47 years, respectively, resulting in an Interquartile Range (IQR) of 1.14 years. The R-squared (\(R^2\)) was 0.911, with an Explained Variance of 0.914, reflecting the higher accuracy achievable in this age group due to the well-defined stages of dental development.
In contrast, for individuals over 24, the model achieved an MAE of 3.78 ± 0.06 years and an MSE of 23.93 years. The Median Absolute Error was 3.05 years, with the 25th and 75th percentiles of Absolute Error at 1.38 and 5.33 years, respectively, resulting in an IQR of 3.95 years. The R-squared (\(R^2\)) was 0.902, and the Explained Variance was 0.903. These results highlight the increased challenge of age estimation in older individuals due to the variability of age-related dental changes.
The continuous exposure of teeth to various environmental factors, such as cariogenic bacteria, dietary habits, and oral hygiene practices, contributes to this complexity. Additionally, genetic predispositions and the presence of restorative materials can further impact the accuracy of age estimation in older adults, leading to the discrepancy depicted in Fig. 6.
Discussion
Despite the mixed effects of augmentation, the overarching results underline the potential of our method, which leverages the InceptionV4 architecture combined with selected data augmentation techniques, in constructing a robust and reliable age estimation model for most age ranges. The overall improvement on every assessment metric indicates that our model is, in fact, stable, consistent, and robust to outliers and complex prediction cases.
In our experiments, we also implemented several imbalanced augmentation strategies. These strategies were designed to generate more synthetic examples for less frequent age groups to address the performance issues originating from the imbalanced distribution of our dataset. However, none of these approaches successfully mitigated the adverse effect of augmentation on the 0 - 5 age group. Given that every prediction for patients within this age group is being overestimated, it appears that this group is negatively affected by the augmentation of the 5–10 age group, which shares several robust features such as the eruption of new teeth and has a considerably larger number of available exams.
However, we posit that expanding the training set with new raw exam data, specifically from patients within the under five and over 70 age brackets, could potentially bolster the model’s performance and mitigate the previously highlighted limitations of our model.
We also submitted the 1004 testing image predictions, accompanied by their integrated gradients heatmaps, as shown in Fig. 7, to the scrutiny of two oral radiologists. Their assessments determined whether the predictions made by the model were consistent and if the model was appropriately focusing on the correct regions of the PRs.
They discovered that the neural network primarily focused on areas already used in forensic dentistry for age determination. This includes pulp measurements, in line with the Cameriere Method, and the calcification stages of permanent teeth as per the Demirjian and Moorrees Methods. The model’s importance to these areas provides validation from a clinical perspective, suggesting that it’s not just learning patterns in the data but meaningful physiological changes associated with the aging process.
The domain experts noticed that in early ages, the integrated gradient heat maps suggest that the model focuses on the calcification stages of the permanent tooth and the degree of tooth eruption, which are good features that can also be used for manual age estimation when the subject is young.
In Fig. 7A, we have a young individual with a confirmed age of 3.67 years. In this image, it is clear that the areas where the network most focuses its learning are where odontogenesis occurs. We can visualize the tooth germs of permanent teeth that originate from an invagination of the dental lamina of the equivalent deciduous tooth when its deciduous tooth is in the cap phase (during the 12th week of intrauterine life). The sprouts of the permanent ones can be seen in the image highlighted in yellow.
With the growth and development of the permanent teeth, a natural process of rhizolysis of the primary teeth occurs. The deciduous teeth in different stages of rhizolysis are highlighted in blue in Fig. 7A. This means that while the permanent tooth grows, the root of the primary tooth, which serves as the “path” for the permanent tooth to erupt, is reabsorbed. When the root is entirely or mainly lost, the tooth softens and is exfoliated, creating space for the permanent tooth to erupt.
The eruption of deciduous teeth begins at six months with the lower central teeth and ends at 24 months with the eruption of the deciduous molars. Exfoliation of the deciduous teeth and exchange for permanent analogs begins at 6–7 years of age with the lower central incisors and ends with the eruption of the third molar, which occurs between 16-24 years of age (except impacted teeth, which never erupt in the oral cavity due to its positioning. The table of dental development, “Chronology of the human dentition”35) made in 1936, is still a textbook table, very well accepted and inspired by Nolla et al.15 and Moorrees et al.2 works on dental development and chronology.
Table 3 shows the chronology of deciduous and permanent tooth eruption and exfoliation of primary teeth. Based on this knowledge, we could manually evaluate the chronological age of this child in Fig. 7A approximately. The experts noticed the subjects being assessed did not lose any primary teeth, but the permanent lower first molar almost erupted. Considering the central incisors have not yet fallen, the child must be around 4-5 years old. However, the position of the lower molar about to erupt can be a confusing factor, and in this case, it is, as this tooth only appears in the oral cavity after age 6. Even with the model’s higher error in similar cases, the network focuses on places where a trained human observer would immediately look.
The model seems to rely more on pulp measurement in older individuals, which is also an expected feature for this particular age group (Fig. 7B. Comparing Fig. 7B and C, we can also notice that older teeth (C) have a smaller space containing the pulp and a much thicker layer of dentin around it, making it almost impossible to see the pulp in some teeth, especially the anterior teeth. As a person ages, the pulp cavity of a tooth diminishes in size due to the continuous production of secondary dentin throughout a person’s lifetime. This newly formed dentin gradually deposits on all the inner walls of the pulp chamber and root canals, resulting in a natural reduction in the size of the pulp cavity.
Comparing our test with other studies that use only manual assessment methods, we realized that a full panoramic X-ray allows the test to analyze and determine ages into a broader spectrum without cutting or focusing on areas of interest. Manual tests are restricted when used at older ages. The method of Dermijian4, Nolla15, Moores2 and their associates have difficulty evaluating older ages because after closing the apex of the third molars, there are no changes in the aspect of tooth development. This phase occurs between the ages of 18–22. Our model performs very well up to 39 years of age.
We also find significant improvements by analyzing the Cameriere5 method. His studies evaluating the closing of the apex on permanent teeth also fail to reach advanced ages.
By evaluating the entire oral cavity, our network can detect differences already known in the classical literature and combine them, improving its performance and reliability when compared with manual methods.
We hypothesize that the model achieves excellent performance by analyzing the pulp chamber volume until the age range of 39-45 years. Beyond this, measuring these features in a panoramic radiograph image becomes challenging. It is crucial to note that a PR is a two-dimensional image of a three-dimensional object, which could lead to proportion distortions. This might suggest that models trained on Dental Cone Beam CT could potentially outperform the proposed model.
Another factor that must be considered is that with increasing age, the incidence of tooth loss and rehabilitative treatments (such as dental implants) tends to increase. However, these factors do not precisely indicate age since accidents or other circumstances leading to tooth loss can occur at any stage of life.
Our results highlight the model’s consistent performance across biological sexes, demonstrating no substantial bias towards either male or female predictions. This lack of bias underscores the model’s robustness and reliability in diverse populations. Moreover, the analysis by age groups revealed a notable difference in accuracy: the model achieved higher precision for individuals aged 24 and under, likely due to the well-defined stages of dental development in younger individuals. Conversely, the increased error rates in the older age group reflect the complexities introduced by continuous dental changes and environmental factors. These findings suggest that while the model is effective across different sexes and age groups, there is room for improvement in accurately estimating ages in older individuals, potentially through integrating more sophisticated features or additional data from these age groups.
Interestingly, according to the experts’ observations, in the absence of teeth, the network predominantly relied on the same features as Human Experts: information from the mandibular ramus, mandibular canal, and vertebrae (Fig. 7C). This not only indicates that our model is capable of utilizing a wide range of anatomical features, thus demonstrating adaptability and robustness in the face of varied input data but also reaffirms that the model is making predictions based on expected image characteristics, further attesting to its reliability and consistency in age estimation tasks. However, as previously mentioned, panoramic radiographs do not provide an accurate and faithful proportion of the area of interest, and this may also explain the model’s declining performance when evaluating older individuals.
Experiments
Methods
This study was approved by the Federal University of Pernambuco ethics committee and the Center for Medical Sciences (CAAE 42878921.6.0000.5208). The panoramic x-ray images were extracted from the university database and images from the state of Pernambuco. A Brazilian dataset may be more representative due to its high level of miscegenation. During its history, several people from Europe, Africa, and Asia immigrated to the country and joined the native Brazilians, creating a mixture of genetic characteristics36.
For this purpose, we created a dataset of 12,827 images acquired between 2017 and 2018. The ages of the patients range from 2.25 to 96.50 years. Unlike other studies, we decided not to exclude samples or preprocess them to create a dataset with characteristics closer to everyday life, facilitating the practical use of our models to analyze the individual’s age.
Two dentists, both specialists with extensive experience in radiology and forensic dentistry, evaluated the sample images prior to the study to identify any positioning errors and technical flaws. Only images that met the clinical standards for use were included in the study.
Pipeline
The proposed pipeline, shown in Fig. 8, encompasses several key stages, each meticulously designed to enhance the accuracy and robustness of our predictive models. These stages, outlined below, ensure a systematic approach, balancing the depth of data exploration and the precision of modeling techniques. This rigorous structure not only aids us in predicting chronological age but also sets the foundation for future explorations, such as predicting biological sex.
-
Data Collection: Our study involved the meticulous gathering of 12,827 panoramic radiograph images and corresponding patient data sourced from the biobase of Universidade Federal de Pernambuco. To facilitate this extensive data acquisition, we employed a custom-designed web scraping tool tailored for this task. This approach ensured the efficient collection of pertinent information and maintained the integrity and reliability of the data obtained, which is critical for the accuracy and success of our research.
-
Data Quality Analysis: In this stage, we meticulously analyzed the acquired samples and discarded nine files due to inconsistencies, such as x-ray clippings or corrupted images. Additionally, we excluded 2782 exams from patients who underwent two or more exams. This decision was made to prevent data leakage, as retaining multiple exams from the same patient could introduce bias and compromise the integrity of the training process. To ensure data integrity and reproducibility, we hashed the images, which facilitates future audibility and validation of the data utilized at each step of our pipeline process.
-
DataLoader Structuring: We create custom DataLoaders that standardize image sizes into 299x299, normalize pixel values between 0 and 1, perform several data augmentation strategies on training sets, and configure image batches of 32 exams to feed the machine learning models, better detailed in the Data Preprocessing subsection.
-
Model Construction: Next, we trained several versions of the InceptionV4 modified model based on state-of-the-art computer vision architectures on fully automated pipelines and its best practices, such as weight initialization, learning rate variation, and early stopping to prevent overfitting.
-
Structured Experimentation and Tunings: In a Weights and Biases37 controlled training environment, we conducted structured experimentations on data augmentation possibilities and hyperparameters to optimize the performance of our models.
-
Model Evaluation: To finish the pipeline, we assess our models’ robustness and generalization capabilities while considering the degradation conditions found in the PRs and accounting for the unique characteristics of the Brazilian population. With the help of domain experts, we ensure that our models can perform effectively across a broad range of PRs and, most importantly, that their accuracy is based on valid reasoning.
All experiments in our study were conducted using Python 3.10.11 and Pytorch 2.0, specifically ensuring global reproducibility. We achieved this by setting persistent parameters for random number generation with a seed value of 0, ensuring that each experiment run produces consistent results and eliminating variations due to random initialization. The experiments were performed on identical hardware configurations, specifically an Nvidia RTX 3060 GPU with 12GB of VRAM, further contributing to the reliability and comparability of our results.
Our experimental setup included several hyperparameters to fine-tune the model. We used the InceptionV4 model with an input size of 299x299 pixels, a batch size of 32, and a data augmentation strategy to enhance the training process. The augmentation configuration included horizontal flips, random brightness, contrast adjustments, rotations, translations, zoom, and erasing, all applied with specified probabilities and factors. We trained the model for 100 epochs with a learning rate of 0.001, using the Adam optimizer and a ReduceLROnPlateau learning rate scheduler. Regularization techniques such as Batch Normalization, Early Stopping with patience of 20 epochs, and a Dropout layer with a rate of 0.7 were employed to prevent overfitting. The model was trained as a single task to predict age in years, using Mean Absolute Error (MAE) as the loss function.
To further enhance the transparency and reproducibility of our study, we made the Weight & Biases runs public, providing additional information and reinforcing the strengths of our paper. You can access our Weights & Biases reports on https://api.wandb.ai/links/ai-odontology-research/asdqxorpt.
Dataset description and distribution
When employing methods for age estimation, including Machine Learning methods, it is essential to consider a particular population’s unique geographical, socio-nutritional, and hormonal factors to minimize individual differences in dental development25. This regional approach is essential for increasing the precision of age estimations using dental radiographs.
We collected a new dataset of 12,827 dental panoramic X-ray images. These images encompass a diverse patient demographic regarding age, biological sex, and dental development stages. Each image in the dataset has been meticulously labeled with a unique \(image\_id\) and correlated with corresponding age and sex details, as verified from patients’ identification documents. The dataset’s composition is detailed in an accompanying XML file, enabling precise matching and retrieval of patient information for each image.
A rigorous data curation process was undertaken to ensure the integrity and applicability of our dataset for developing a reliable age estimation model. Before the analysis, we meticulously filtered the dataset to remove any panoramic dental X-rays that did not meet the stringent criteria necessary for high-quality research. This filtration process involved excluding images that were mere segments of full PRs, upload failures, cases with duplicate patient entries, and those with identical hashes, among other base issues. Consequently, the original collection of 12,827 images was distilled down to a refined dataset comprising 10,035 complete and verified PRs. This refined dataset thereby enhances the validity of our study and the accuracy of our predictive modeling.
To provide a clear visual representation of the dataset’s demographic spread and ascertain its representative nature for the Brazilian population, we have included a histogram that depicts the age distribution of the patients (see Fig. 9). The histogram is not only instrumental in illustrating the skewness of age distribution but also in identifying potential outliers or anomalies within the data. Moreover, the histogram reveals a significant imbalance in the age groups represented, with a lower density at the younger and older extremes of the age scale and a higher concentration in the middle age ranges. This distributional pattern suggests that age estimation models may require adjustments or weighting to compensate for the under-representation of data at both ends of the age spectrum, ensuring that the models are robust and reflective of the entire population.
The statistical analysis of the dataset provides a comprehensive overview of the demographic characteristics. The mean age of the individuals in the dataset is 38.25 years, indicating the average age of the patient group. The median age, representing the middle value in the age distribution, is slightly lower at 36.50 years. This subtle difference between the mean and median may suggest a distribution that is not perfectly symmetrical. The standard deviation, a measure of the variation or dispersion of ages, is relatively high at 20.25 years, underscoring the broad age range of the subjects included, spanning from as young as 2.25 years to as old as 96.50 years. The gender distribution within the dataset is also noteworthy, with 4,259 males and 5,776 females, reflecting a greater representation of female subjects.
To the best of our knowledge, the incorporation of this new dataset positions our study at the forefront of advancing forensic clinical research using PRs tailored to the distinctive characteristics of the Brazilian population, ensuring that the predictive modeling is as relevant and accurate as possible. Additionally, this dataset can be made available upon request via email, followed by completing a detailed form to safeguard privacy and maintain data confidentiality. This process is in place to ensure the ethical and secure use of the data in research endeavors.
Data pre-processing
Our data preprocessing begins with the development of custom DataLoaders, which are crucial for standardizing image sizes to 299x299 pixels and normalizing pixel values to a range between 0 and 1. This step ensures that our models receive consistently formatted input, a critical factor for the success of image analysis and model performance.
A significant portion of our preprocessing involves data augmentation on training sets. We apply a range of transformations, such as flipping, rotation, and adjustments in brightness and contrast, to enhance the dataset. These augmentations simulate various imaging conditions, improving the model’s generalization capabilities. Alongside these augmentations, we ensure the normalization of pixel values across all images to maintain consistency, which is crucial for accurate image processing.
We leverage Python scripts with integrated libraries like Torchvision and PIL to execute these preprocessing tasks. These scripts are adept at loading images, applying transformations, and saving the resulting augmented images. Each image is resized to 299x299 pixels and undergoes pixel normalization to meet our strict processing standards.
An innovative feature we introduced to our methodology is an augmentation multiplier. Derived from the age group distribution within our dataset, this factor permits increased augmentation for underrepresented groups, thereby balancing the dataset. This strategy was developed to minimize bias in the model’s output, although it did not fully succeed.
The final stage in our preprocessing pipeline involves systematic storage and management of the processed images and their associated metadata. We employ Pandas DataFrames and structured file-naming systems for efficient data organization. This approach is critical for handling large datasets and ensures streamlined access during the model training phase.
Augmentation tuning
Drawing insights from the studies developed in26,30,31,33,38, and starting with our base model without augmentation as a foundation, we embarked on over 30 experimental trials. The objective was to discern the augmentation strategies that were most effective and could be seamlessly integrated into our DataLoader function. The configuration that resulted in the most promising outcomes is outlined in Table 4.
The adoption of this configuration led to a notable narrowing of the gap between the Training and Validation Mean Absolute Error. This points to an enhancement in the model’s capacity for generalization.
It’s worth noting that the factor values for augmentation were intentionally configured to effect minor alterations, aligning with findings in related literature, especially as recommended in Ref.28,33. This decision was taken to strike a balance between introducing variability through augmentation and preserving the essential features inherent in the original images, given that obtaining PRs is usually consistent and does not typically exhibit significant variability.
After the data quality analysis process and image selection, we have 10,035 unique patient exams. Our experiment initially started with a training set of 8,027 images randomly chosen, which represents 80% of the total dataset. To enhance the dataset and improve our model’s performance, we applied a data augmentation technique that triples the size of the training set. Consequently, the augmented training set now comprises 24,081 images, providing a more diverse collection for training our architecture.
Model selection
In Ref.34, the authors employed a Neural Architecture Search approach to evaluate various factors, such as Kernels, Multi-Branch, Architecture Depth, and pre-trained weights for transfer learning. Their findings revealed that smaller architectures using multi-branch, asymmetric kernels, and no pre-trained weights tend to yield better performance and prevent overfitting.
In light of these findings, our initial approach to estimate chronological age using the PRs was conducted using the InceptionV4 architecture presented in Ref.39 without fine-tuning as our encoder, followed by a dropout layer and two fully connected layers for decoding and performing a regression task, respectively. Although not a tiny architecture, it incorporates multi-branch and asymmetric kernels, consistent with the configuration outlined in Hou’s work.
Figure 10 depicts our proposed network architecture. We employ the Rectified Linear Unit (ReLU) activation function for most layers. This choice is motivated by ReLU’s proven effectiveness in counteracting the vanishing gradient problem, thus promoting more efficient learning during the network’s training phase40. However, for the FC 2 layer, designated for our regression task, we intentionally leave it as a linear layer without any activation function, allowing for a direct linear transformation of its inputs.
As shown in Fig. 10, the InceptionV4 architecture begins with a ’Stem’ module, conducting a series of convolutional operations on the input image to kickstart feature extraction across diverse scales and depths.
Next, the Inception-A module is used. It consists of parallel convolutions of varying types designed to capture a broad spectrum of features from the image. Following Inception-A, the Reduction-A module comes into play. This module decreases the dimensionality of the feature maps, thereby amplifying computational efficiency through pooling operations and stride-2 convolutions.
Following this, the Inception-B module is deployed. Similar to Inception-A, it also carries out parallel convolutions but with distinct types of convolutions to extract more complex features. After Inception-B, the Reduction-B module is employed, yet again aiming to reduce the dimensionality of feature maps.
Subsequently, the Inception-C module is enacted, focusing on extracting more refined and detailed features. After this stage, an Adaptive Average Pooling layer comes into play. It reduces the spatial dimensions of the feature maps to a single vector, preserving the most important spatial information.
Upon completing the Adaptive Average Pooling layer, a Dropout layer with a dropout rate of 0.7 is implemented. This serves as a regularization strategy, preventing model overfitting by sporadically “dropping out” a portion of the neuron outputs during training, thus encouraging a more balanced distribution of neuron weights.
Finally, two fully connected (FC) layers, FC1 and FC2, are harnessed to decode the latent space representation vector. The FC1 layer digests the broad set of features extracted by the previous layers, discerning which combinations are most crucial to the final decision. The FC2 layer is entrusted with producing the model’s output predictions. In particular, it assesses all combinations passed on by the FC1 layer and generates a singular output, which is the predicted age of the patient based on the PR.
The collaboration of these modules allows our Adapted Inception-v4 architecture to learn various features at different levels of abstraction and scale, thereby maximizing its performance in computer vision tasks.
To ensure a beneficial start to the learning process, we decided to initialize our kernel weights using the Kaiming Normal Method41, also known as He Weights Initialization. It has proven to be significantly advantageous when working with ReLU activation functions, maintaining the variance of the inputs across the network layers, thus avoiding issues like signal fading or exploding gradients during backpropagation.
In future experiments, we plan to adapt the network to perform a multitask classification, employing a single encoder for simultaneous age and biological sex prediction. This approach aims to leverage the shared features within the dataset to enhance the model’s efficiency and effectiveness in handling multiple tasks concurrently.
Related work
Numerous research initiatives have been undertaken to automate age estimation using machine learning algorithms in this rapidly advancing scenario. We can highlight a study by Hou and colleagues34. Other studies focusing on age and biological sex estimation through panoramic radiographs (PRs) have valuable insights as23,24,25,26,27,28,29,30,31,32,33. The use of computer vision models and methods was also investigated38,39,41,42. A brief summary of the related works can be seen in Table 5. Our project builds upon these studies, extending their findings to Brazilian patients and incorporating additional explainability methods.
The most relevant related works have adopted various approaches; some focused on identifying the stage of mineralization for subsequent age classification according to Demirjian’s method28,31, others aimed to classify by broad age groups23,26,29,30, and some even focused on the prediction of exact age24,33,34.
The preprocessing techniques varied significantly, using both manual and computerized processing for feature extraction and subsequent inference tasks. In general, the works that most closely align with our goal are those related to the exact prediction of age, with particular emphasis on Vila-Blanco et al. (2020)33 and Hou et al. (2021)34, who obtained promising results on the broad age range of their respective populations, as shown in Table 6:
With the growing interest in assessing dental age and the creation of improved study methods that combine dental radiology and neural networks, we realized that a paper carried out with a Brazilian sample could obtain a different result due to the great miscegenation that occurred here since the 16th century. The mixture of DNA from immigrants across Asia, Africa, and Europe36 creates a well-balanced sample, making it possible to generalize the results better. A model trained with samples of individuals with different ancestries makes this experiment unique and gives it a more balanced dataset and benchmark than other studies.
Conclusion
Our research extensively investigates the application of deep learning models and data augmentation techniques for age estimation using panoramic radiograph images. We demonstrated that the InceptionV4 architecture, when paired with a meticulous augmentation strategy, can be a formidable tool in building robust and reliable age prediction models. The efficiency of our model was notable in its performance across the majority of age ranges, even with the initial challenges presented by an imbalanced dataset.
Through a series of experiments and evaluations, we determined that our data augmentation strategies significantly improved the model’s generalization capabilities, as illustrated by the reduction in Mean Absolute Error and Mean Squared Error metrics. Moreover, we noticed an enhancement in model stability and reliability, as shown by the marked decrease in the median and interquartile range of the absolute error.
However, our methodology exhibited limitations, particularly in handling the complexity of minority age groups at the extremes of the age spectrum (0–5 years and 70+ years). This indicates a need for future research to explore more targeted augmentation strategies or alternative methodologies, such as Balancing Generative Adversarial Networks (BAGANs), which could better address the unique characteristics of these age groups during dataset augmentation to achieve a uniform age distribution.
Even though our new dataset is a pioneering resource for advancing forensic clinical research using PRs tailored to the distinct demographics of the Brazilian population, it also has limitations. These include the imbalance mentioned above and its geographic confinement to a single region of Brazil, which may result in suboptimal performance for Brazilians from the other areas.
Another factor contributing to the decline in accuracy with advancing age could be the continuous exposure of teeth to the oral environment after the eruption of the first permanent tooth. This environment includes cariogenic and tartar-forming bacteria, dietary habits, oral hygiene practices, parafunctional habits, and genetic predispositions such as diabetes, which may exacerbate conditions like periodontitis. These factors can significantly impact dental health outcomes.
All these influences on an individual’s oral condition can lead to errors in age detection, especially in cases of early tooth loss, teeth rehabilitated with crowns and implants, significant bone loss due to periodontal disease, and the presence of restorative materials in deep cavities. In this study, we included patients with all these conditions to mimic the daily operations of a dental radiology clinic closely. While excluding cases involving restored teeth or focusing solely on a specific dental group could potentially enhance our model’s performance, our goal was to adopt a broader approach that reflects the diversity and complexity of real-world clinical scenarios.
To address these limitations, we have collaborated with researchers from Faculdade de Odontologia de Piracicaba-Unicamp to expand our dataset with new exams from patients across new regions, aiming to increase its robustness and geographic inclusivity in our future work.
Furthermore, the insights gained from the evaluations of the model’s predictions by oral radiologists have proven invaluable, emphasizing the critical role of integrating expert human analysis into the model validation process.
Future work in collaboration with human experts also includes a planned longitudinal study designed to assess the predictive power of our model over time, using the same patients at different ages. This study could provide critical insights into how the network handles the progression of aging. This methodology will validate our model’s robustness and identify any necessary modifications to improve its predictive validity. Consequently, we can better comprehend and adapt our model to the dynamics of aging, ensuring its increased accuracy and reliability for future predictions.
Our research offers crucial insights into age estimation using PRs with deep learning and data augmentation. It underscores the complexities of handling imbalanced datasets and variable performance across different age groups. Our work also paves the way for further investigations to enhance the accuracy and robustness of age estimation models and exemplifies the potential of AI applications in dentistry and radiology.
Data availability
The datasets analyzed during the current study are available from the corresponding author upon reasonable request. These datasets include panoramic dental radiographs and their associated annotations. Due to privacy and ethical considerations, some data may be subject to anonymization procedures before being shared. Interested researchers are encouraged to contact the corresponding author to discuss data access, which will be facilitated in compliance with applicable data protection regulations and ethical guidelines. The entire request process is available on the following Github: https://github.com/willianfco/ehaml-brazilians-dataset.
Code availability
The code developed for the Adapted InceptionV4 model and analysis presented in this study is available for open access. It has been deposited in a publicly accessible repository, ensuring transparency and enabling replication of the research findings. For access to the code, including detailed documentation on its usage, installation instructions, and examples, please visit https://github.com/willianfco/ehaml-brazilians-code. The repository contains all necessary scripts for data preprocessing, model training, evaluation, and results generation. The authors welcome contributions from the research community to refine further and enhance the utility of the code.
References
Dalitz, G. Age determination of adult human remains by teeth examination. J. Forensic Sci. Soc. 3, 11–21 (1962).
Moorrees, C. F., Fanning, E. A. & Hunt, E. E. Jr. Age variation of formation stages for ten permanent teeth. J. Dent. Res. 42, 1490–1502 (1963).
Bang, G. & Ramm, E. Determination of age in humans from root dentin transparency. Acta Odontol. Scand. 28, 3–35 (1970).
Demirjian, A., Goldstein, H. & Tanner, J. M. A new system of dental age assessment. Hum. Biol. 211–227 (1973).
Cameriere, R., Cingolani, M. & Ferrante, L. Variations in pulp/tooth area ratio as an indicator of age: A preliminary study. J. Forensic Sci. 49, JFS2003259 (2004).
Spalding, K. L., Buchholz, B. A., Bergman, L.-E., Druid, H. & Frisén, J. Age written in teeth by nuclear tests. Nature 437, 333–334 (2005).
Alkass, K. et al. Age estimation in forensic sciences: Application of combined aspartic acid racemization and radiocarbon analysis. Mol. Cell. Proteomics 9, 1022–1030 (2010).
Rajkumari, S., Nirmal, M., Sunil, P. & Smith, A. A. Estimation of age using aspartic acid racemisation in human dentin in Indian population. Forensic Sci. Int. 228, 38–41 (2013).
Elfawal, M. A., Alqattan, S. I. & Ghallab, N. A. Racemization of aspartic acid in root dentin as a tool for age estimation in a kuwaiti population. Med. Sci. Law 55, 22–29 (2015).
Bekaert, B., Kamalandua, A., Zapico, S. C., Van de Voorde, W. & Decorte, R. Improved age determination of blood and teeth samples using a selected set of DNA methylation markers. Epigenetics 10, 922–930 (2015).
Puranik, M., Priyadarshini, C. & Uma, S. R. Dental age estimation methods: A review. Int. J. Adv. Health Sci. 1, 19–25 (2015).
Chen, S., Lv, Y., Wang, D. & Yu, X. Aspartic acid racemization in dentin of the third molar for age estimation of the Chaoshan population in south china. Forensic Sci. Int. 266, 234–238 (2016).
Benjavongkulchai, S. & Pittayapat, P. Age estimation methods using hand and wrist radiographs in a group of contemporary thais. Forensic Sci. Int. 287, 218-e1 (2018).
Márquez-Ruiz, A. B., González-Herrera, L., Luna, J. D. & Valenzuela, A. DNA methylation levels and telomere length in human teeth: Usefulness for age estimation. Int. J. Legal Med. 134, 451–459 (2020).
Nolla, C. M. et al. The Development of Permanent Teeth (University of Michigan, 1952).
Cameriere, R. et al. Reliability in age determination by pulp/tooth ratio in upper canines in skeletal remains. J. Forensic Sci. 51, 861–864 (2006).
Cameriere, R., Ferrante, L. & Cingolani, M. Age estimation in children by measurement of open apices in teeth. Int. J. Legal Med. 120, 49–52 (2006).
Cameriere, R., De Luca, S., Alemán, I., Ferrante, L. & Cingolani, M. Age estimation by pulp/tooth ratio in lower premolars by orthopantomography. Forensic Sci. Int. 214, 105–112 (2012).
Morse, D. R., Esposito, J. V., Schoor, R. S., Williams, F. L. & Furst, M. L. A review of aging of dental components and a retrospective radiographic study of aging of the dental pulp and dentin in normal teeth. Quintessence Int.22 (1991).
Fernandes, M. M. et al. Age estimation by measurements of developing teeth: Accuracy of Cameriere’s method on a Brazilian sample. J. Forensic Sci. 56, 1616–1619 (2011).
Farges, J.-C. et al. Dental pulp defence and repair mechanisms in dental caries. Mediators Inflamm. 2015, 230251 (2015).
Ricucci, D., Loghin, S., Lin, L. M., Spångberg, L. S. & Tay, F. R. Is hard tissue formation in the dental pulp after the death of the primary odontoblasts a regenerative or a reparative process?. J. Dent. 42, 1156–1170 (2014).
Kim, S., Lee, Y.-H., Noh, Y.-K., Park, F. C. & Auh, Q.-S. Age-group determination of living individuals using first molar images based on artificial intelligence. Sci. Rep. 11, 1073 (2021).
Shen, S. et al. Machine learning assisted Cameriere method for dental age estimation. BMC Oral Health 21, 1–10 (2021).
Galibourg, A. et al. Comparison of different machine learning approaches to predict dental age using demirjian’s staging approach. Int. J. Legal Med. 135, 665–675 (2021).
Santosh, K. et al. Machine learning techniques for human age and gender identification based on teeth x-ray images. J. Healthc. Eng.2022 (2022).
Zaborowicz, K., Garbowski, T., Biedziak, B. & Zaborowicz, M. Robust estimation of the chronological age of children and adolescents using tooth geometry indicators and pod-gp. Int. J. Environ. Res. Public Health 19, 2952 (2022).
De Tobel, J., Radesh, P., Vandermeulen, D. & Thevissen, P. W. An automated technique to stage lower third molar development on panoramic radiographs for age estimation: A pilot study. J. Forensic Odontostomatol. 35, 42 (2017).
Štepanovskỳ, M., Ibrová, A., Buk, Z. & Velemínská, J. Novel age estimation model based on development of permanent teeth compared with classical approach and other modern data mining methods. Forensic Sci. Int. 279, 72–82 (2017).
Avuçlu, E. & Başçiftçi, F. New approaches to determine age and gender in image processing techniques using multilayer perceptron neural network. Appl. Soft Comput. 70, 157–168 (2018).
Banar, N. et al. Towards fully automated third molar development staging in panoramic radiographs. Int. J. Legal Med. 134, 1831–1841 (2020).
Merdietio Boedi, R. et al. Effect of lower third molar segmentations on automated tooth development staging using a convolutional neural network. J. Forensic Sci. 65, 481–486 (2020).
Vila-Blanco, N., Carreira, M. J., Varas-Quintana, P., Balsa-Castro, C. & Tomas, I. Deep neural networks for chronological age estimation from opg images. IEEE Trans. Med. Imaging 39, 2374–2384 (2020).
Hou, W. et al. Exploring effective dnn models for forensic age estimation based on panoramic radiograph images. In 2021 International Joint Conference on Neural Networks (IJCNN) (ed. Hou, W.) 1–8 (IEEE, 2021).
Logan, W., Kronfeld, R. & McCall, J. O. Chronology of the human dentition (1936).
Rodrigues de Moura, R., Coelho, A. V. C., de Queiroz Balbino, V., Crovella, S. & Brandão, L. A. C. Meta-analysis of brazilian genetic admixture and comparison with other Latin America countries. Am. J. Hum. Biol. 27, 674–680 (2015).
Weights biases. https://wandb.ai/site.
Mumuni, A. & Mumuni, F. Data augmentation: A comprehensive survey of modern approaches. Array 16, 100258 (2022).
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proc. AAAI conference on artificial intelligence, Vol. 31 (2017).
Hu, Z., Zhang, J. & Ge, Y. Handling vanishing gradient problem using artificial derivative. IEEE Access 9, 22371–22377. https://doi.org/10.1109/ACCESS.2021.3054915 (2021).
He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proc. IEEE international conference on computer vision, 1026–1034 (2015).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In International Conference on Machine Learning (ed. Sundararajan, M.) 3319–3328 (PMLR, 2017).
Acknowledgements
FACEPE and FAPESP funded the project’s infrastructure through the Public Joint Call FAPESP-FACEPE 08/2022 - Support for Research in Applied Artificial Intelligence (AI). We thank both foundations for their support and contribution to the development of this research.
Author information
Authors and Affiliations
Contributions
W.O. and C.Z. conceived the experiment(s), W.O. experimented, C.P. streamlined the collection of PRs, and M.P., and M.A. assessed the prediction heatmaps to ensure clinical relevance. All authors analyzed the results and reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Oliveira, W., Albuquerque Santos, M., Burgardt, C.A.P. et al. Estimation of human age using machine learning on panoramic radiographs for Brazilian patients. Sci Rep 14, 19689 (2024). https://doi.org/10.1038/s41598-024-70621-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-70621-1
Keywords
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.