Introduction

Age estimation in living individuals is important for clinical applications1,2,3 as well as in legal or forensic medicine investigations4 and sports5,6,7,8, but it is prone to uncertainty caused by the variation of human development9. Concerning biological age one can draw insights from comprehensive studies10,11 but the use of biological development for estimating chronological age, as required in forensic applications, is still a topic of current research12,13. Recently, the increased flow of individuals into and across the European Union raised interest in forensic age estimation for children, adolescents and young adults claiming to be minors but lacking valid identification documents14.

Current multi-factorial age estimation methods are based on a radiograph of the hand, a panoramic X-ray image of the teeth and computed tomography images of the clavicles15. To avoid exposure to ionizing radiation, there is growing interest in magnetic resonance imaging (MRI) for forensic age estimation5,16,17,18,19,20,21,22,23. This interest has led to developments such as a recently proposed fully automatic machine learning based method24 based on MR images of the left hand and wrist.

Compared to acquiring radiographs or computed tomography images, MRI has the drawback of considerably longer acquisition times, leading to increased examination costs and reduced patient comfort. Additionally, longer acquisition times are more prone to errors due to motion artefacts when acquiring images of children or adolescents. Therefore, short examination times are highly preferable.

A reduction of MR scanning time can be achieved by leaving out acquisition steps, often termed undersampling. The CAIPIRINHA (Controlled Aliasing In Parallel Imaging Results IN Higher Acceleration) undersampling strategy25 enables optimized acceleration for 3D image acquisition and is readily available on current MR scanners. To recover artefact-free images from a reduced amount of data, an advanced reconstruction strategy has to be applied. For this task total generalized variation (TGV) regularization26,27 has already demonstrated its applicability in various MRI studies28,29,30,31. We thus anticipate applicability with a high acceleration potential for volumetric MR data for age estimation by using CAIPIRINHA and TGV.

We conduct this feasibility study to investigate the degree of acceleration that can be applied to hand/wrist MRI for age estimation without significantly influencing the estimation outcome. This aims at determining limits and applicability of the proposed method by comparing the reliability of both human and automated evaluation, reflecting the potential of automatic methods to support radiologists in age estimation tasks.

Methods

Ethics Statement and Informed Consent

The study was performed in accordance with the Declaration of Helsinki and was approved by the ethical committee of the Medical University of Graz (EK 21–399 ex 09/10). All volunteers provided written informed consent. From underage participants written consent from a legal guardian was additionally obtained.

Subjects

For this feasibility study 18 healthy male Caucasian volunteers between 13.8 and 23.2 years (mean = 17.2 y, median = 17.0 y) were recruited to acquire three-dimensional MR images of the left hand and wrist. The data of 15 volunteers were used to investigate implications of a reduction in acquisition time on resulting age estimates as described below. The data of the remaining three volunteers (see Table 1) were used to compare retrospectively undersampled images with actually acquired undersampled images.

Table 1 Overview of acquisition times and acceleration factors for simulated and acquired data.

MR Image Acquisition

MRI exams were performed using commercially available clinical 3 T MR scanners (Skyra/Prisma, Siemens Healthineers, Erlangen, Germany) and a conventional 20-channel receive-only head-neck coil (Siemens Healthineers, Erlangen, Germany). Volunteers were placed in prone position with outstretched left arm. The hand was weighted down using a sandbag to minimize movements.

For all 18 subjects T1-weighted 3D FLASH (Fast Low Angle SHot) VIBE (Volumetric Interpolated Breath hold Examination) measurements (TE/TR/FA = 4.06 ms/14 ms/15°, field-of-view = 129 mm \(\times \) 23 0mm, two averages, acquisition matrix = 129 \(\times \) 230 and image matrix = 288 \(\times \) 512, 72 slices) of the left hand and wrist were acquired. The resulting 3D volumes had an image resolution of 0.45 mm \(\times \) 0.45 mm \(\times \) 0.90 mm and required an acquisition time of tAcqu = 3:46 minutes. For later comparisons with undersampled data, the images from this fully-sampled data are referred to as original images or data.

For three volunteers (see Table 1) accelerated measurements were additionally acquired using CAIPIRINHA with 12 calibration lines and acquisition times of 28, 15 and nine seconds.

For a better understanding, an overview of the study design is given in Fig. 1.

Figure 1
figure 1

Schematic illustration of the applied method to investigate the reliability of age estimation based on undersampled data. Both original images and images reconstructed from undersampled data (AF: acceleration factor describing speed-up of acquisition time) are used for age estimation applying radiological and automatic estimation methods, respectively. Finally, the differences in the estimates are evaluated. Additionally, simulated data is compared to actually acquired data to show the validity of using retrospectively undersampled data.

Retrospective Undersampling of MRI Data

Undersampling MRI raw data is equivalent to not acquiring part of the data, i.e. leaving out data lines during the acquisition. Therefore, retrospectively undersampling conventionally acquired data by removing data lines from the fully-sampled data set prior to image reconstruction is a valid reference method to determine specific acceleration potential. The retrospective undersampling of the raw MR data was applied by simulating the commercially available CAIPIRINHA acquisition strategy with minimized noise amplification32.

For 15 volunteers, the CAIPIRINHA method with 12 calibration lines was applied retrospectively to simulate six different reduced acquisition times (tAcqu) between 29 and six seconds (see Table 1) providing a total of 105 data sets. Only non-averaged data were undersampled, which additionally reduced the required acquisition time by a factor of two, compared to the standard setting of performing two averages. In order to reduce the computational burden, the multi-channel data were reduced to a lower number of virtual coils via coil compression33. The virtual coil sensitivities were then estimated from the calibration data with the ESPIRiT method34. Image reconstruction was carried out for all simulated acceleration factors (AF) using the TGV method27, which considers smooth tissue variations and uses a dedicated optimization algorithm35 adapted for parallel computing. When comparing images reconstructed from retrospectively undersampled data to original images they will be referred to as simulated images or data.

For the remaining three volunteers, the undersampling patterns were matched exactly to the pattern of the additionally acquired accelerated measurements, simulating acquisition times of 28, 15 and eight seconds, respectively.

The software for image reconstruction is provided online at https://github.com/IMTtugraz/AVIONIC.

Comparison of Simulated and Acquired Data

For three volunteers (see Table 1), we compared acquired undersampled images with the corresponding simulated images. A comparison of changes of specific image features with increasing undersampling factor in both acquired and simulated data serves the purpose of showing the validity of using retrospectively undersampled data for this study.

Skeletal Rating

Skeletal age was rated independently using two different methods. A radiologist with more than five years of expertise in forensic applications (R1) and a pediatric radiologist with five years of experience in bone age estimation (R2) independently evaluated whether the quality of the simulated images was adequate for reproducible radiological age estimation. For MRI-based radiological age estimation, radiologists applied the method proposed by Greulich and Pyle36 (GP) to the MR images evaluated as assessable. The GP method, originally developed for age estimation based on radiographs, was verified to be applicable for age estimation from MR images, reporting errors on the same scale as inter-rater variations37. To avoid biased age estimates the MR images were anonymized and randomized irrespective of the acceleration factor.

To estimate general limits of radiological assessability, an initial analysis was performed after acquisitions of the first five volunteers. The acquired MR data were undersampled according to the values in Table 1. A radiological evaluation rated four out of five data sets with acquisition times below 15 seconds as unusable for a non-ambiguous radiological age estimation. Therefore, for radiological evaluation only original data and simulated image stacks with acquisition times of 29 and 15 seconds – a total of 45 data sets – were presented to radiologists R1 and R2 for age estimation.

The second skeletal age rating was performed using the fully automated age estimation method proposed by Urschler et al.24 extended by improving landmark localization accuracy38 and introducing a novel deep neural network based age estimator39. This setup was used solely as an age predictor, i.e. without using data from the present study to further train the model or tune its parameters.

Statistical Analysis

The main focus of this study was on the reliability of age estimation with decreasing acquisition time and not on the actual absolute results of age estimation. Therefore, we analyzed the difference introduced into the estimated age with decreasing acquisition time to assess reliability. For this analysis the reference age for comparison was the age estimated by each observer using the original images. The difference was then calculated by subtracting the age estimated from original data from the age estimated from simulated data for results of radiologist R1 and R2 (\({\Delta {\rm{Age}}}_{R1}\), \({\Delta {\rm{Age}}}_{R2}\)) and the automatic age estimation (\({\Delta {\rm{Age}}}_{{autom}}\)):

$${\Delta }{{\rm{Age}}}_{R1}={{\rm{Age}}}_{R1}-{{\rm{Age}}}_{R1,orig}$$
$${\Delta }Ag{e}_{R2}=Ag{e}_{R2}-Ag{e}_{R2,orig}$$
$${\Delta }{{\rm{Age}}}_{autom}={{\rm{Age}}}_{autom}-{{\rm{Age}}}_{autom,orig}$$

The standard deviation of the signed differences (SSD) of \(\Delta {\rm{Age}}\) was used as a measure for the reliability of the age estimation, the mean of signed differences (MSD) to identify potential systematic errors. Additionally, the intra-class correlation coefficient ICC was calculated between the age estimates based on original images and the estimates from simulated data sets.

The inter-rater reproducibility between all three observers, i.e. R1, R2 and the automatic age estimation method (A), was determined by calculating ICC and Bland-Altman limits of agreement (LOA) between corresponding age estimates. The inter-rater reproducibility between the radiological estimation and the automatic method thereby provides a measure of conformity between the two different age estimation methods. This information may help to evaluate the potential to combine them to a hybrid between manual and fully automatic age estimation similar to an approach recently proposed for volumetry in oncology40.

All statistical analyses were performed using MATLAB (R2014b, The MathWorks Inc., Natick, MA, USA).

Data Availability

The acquired MRI data sets generated and/or analyzed during the current study are not publicly available for data privacy reasons. The participants did not explicitly give their consent to freely distribute their imaging data, albeit anonymized. However, quantitative measures derived from the imaging data will be made available as a supplementary to this publication.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Results

Image Reconstruction and Image Quality

Figure 2 shows representative images of a central slice of the left hand and wrist of one volunteer (14.2 y) for the original data set and simulated acquisition times of 29, 15 and six seconds. Qualitatively, for an acquisition time of at least 15 seconds no severe artefacts can be identified; however, images with an acquisition time of 15 seconds already feature image blurring, which increases with the acceleration factor. For an acquisition time of six seconds, differences between original and simulated data become clearly visible.

Figure 2
figure 2

Exemplary images of a selected slice of one volunteer (14.2 y) for originally acquired data, Iorig, and simulated images ISim29, ISim15 and ISim6. Differences between original and reconstructed images are additionally displayed for selected image profiles.

Additionally, the difference between original and simulated images is shown for an image profile line covering bone, muscle tissue and joint cartilage. Deviations from the original image increase with the acceleration factor and become pronounced for larger muscle regions. In general, the reduction of available data leads to blurring and loss of morphological details in the resulting images. This blurring is observable for example as overlap of the muscle tissue with metacarpal bones or the broadening of the joint cartilage of the fifth digit (first visible for tAcqu = 15 s) producing positive peaks in the difference of the profile lines.

Assessability of Simulated MR Images

All images with acquisition times of 29 and 15 seconds were rated as suitable for age estimation by both radiologists. The automatic age estimation provided age estimates for all data sets.

Variability and Reliability of Rating

Figure 3 visualizes the influence of the reduction of acquisition time on age estimation by showing the difference to the age estimates based on the original data for radiologists R1 and R2 and the automatic method (the values for all age estimates can be found in Supplementary Table S1 online). For the radiological evaluation, standard deviations of signed differences (SSD) introduced by simulated acceleration were 0.57 y and 0.46 y for acquisition times of 29 and 15 seconds, respectively, for R1 and 0.46 y and 0.44 y for R2; the corresponding values for the mean deviations (MSD) were −0.10 y and 0.00 y for R1 and −0.13 y and −0.07 y for R2. For automatic age estimation, SSD values increased with the acceleration factor and reached a maximum of 0.51 y for tAcqu =6 s, MSD values lay between 0.10 and 0.21 years; all SSD and MSD values are provided in Table 2.

Figure 3
figure 3

Differences to age estimates based on original data set introduced by a reduction of the acquisition time. Differences are shown for (a) R1, (b) R2 and (c) the automatic age estimation method as a function of the acquisition time. Lines in (a) and (b) mark the MSD value for each acceleration factor (exact values are shown in Table 2).

Table 2 Comparison between ratings of radiological and automatic age estimation: reliability of age estimates is reported as correlation with estimates based on fully-sampled data sets.

The values for the ICC in Table 2 show high intra-class correlation for both applied age estimation methods. A comparison to original age estimates yields a minimum ICC of 0.96 for all evaluated data sets; the values for inter-rater variability lay between 0.91 and 0.99. All results were highly significant with p < 0.000001 for all values. The Bland-Altman plots in Fig. 4 show high inter-rater agreement. The mean values of the Bland-Altman analysis lie between 0.03 and 0.33 years and suggest no systematic error in the analysis. Radiological raters R1 and R2 show the best agreement with LOA = 1.02 y, testing the agreement between radiological and automatic method yields LOA = 1.5 y for R1 and LOA = 1.14 y for R2.

Figure 4
figure 4

Bland-Altman plots for inter-rater agreement. Agreement is shown between (a) R1 and R2, (b) R1 and the automatic method (A) and (c) R2 and the automatic method as a function of the acquisition time. µR1,R2, µR1,A and µR2,A, describe the mean value of the age estimates of the respective raters, Δ is the difference between the respective ratings.

The automatic method as well as both radiologists estimated the oldest volunteer (23.2 y) to be over 18 y for all acceleration factors. The evaluation by the radiologists yielded 19 y for all acceleration factors for this volunteer – the maximally assessable age – and the automatic estimation provided estimates between 18.3 y and 18.9 y.

Comparing Simulated and Acquired Data

Figure 5 compares simulated data with actually acquired undersampled data of three volunteers (15.75, 18.85 and 21.61 years from top to bottom) showing image details for simulated (upper rows) and acquired (lower rows) accelerated MRI and acquisition times of 29, 15 and eight seconds. For the youngest volunteer epiphyseal gaps are still visible for an acquisition time of eight seconds and the hyperintense structure marked by circles is blurred equally for simulated and acquired images with decreasing acquisition time. Structures marked in the images of the remaining two volunteers become noticeably blurred for an acquisition time of 15 seconds and disappear for further acceleration; again, this behavior can be seen in both simulated and acquired data.

Figure 5
figure 5

Comparison of simulated (upper rows) and acquired (lower rows) undersampled data for three different volunteers (15.75, 18.85 and 21.61 years from top to bottom) and locations. Arrows mark structures relevant for age estimation, while circles highlight structures changing their appearance with decreasing acquisition time in both simulated and acquired data.

Discussion

The presented results suggest that a radiological analysis can provide reliable age estimates based on hand/wrist MRI using an acquisition time of only 15 seconds, which corresponds to an acceleration factor of approximately 7.5 compared to the original acquisition time of 3:46 minutes. For this duration no relevant artefacts occurred in the simulated images and all data sets were deemed assessable and yielded a maximum SSD of 0.55 years (shown in Fig. 3). This is in the range of reported errors for the radiological examination of skeletal development41. With decreasing acquisition time, automatic age estimation showed an increasing deviation compared to the estimation from the original data set. However, for a simulated duration of six seconds the standard deviation was still only 0.51 years (see Table 2). Both estimation methods yielded small MSD values suggesting that age estimates are not influenced by a systematic offset.

With increasing acceleration factor, images reconstructed from undersampled data tend to appear blurry while noise is suppressed and fine structures become less distinctive, creating an unusual image representation for radiologists. The quality of the simulated images allowed a radiological analysis for acquisition times down to 15 seconds and age estimates for the analyzed data sets were close to identical. The automatic method provided reliable results even for the shortest acquisition time of six seconds. This is a remarkable reduction of the acquisition time as existing age estimation studies at a field strength of 3 Tesla can require acquisition times of up to six minutes for the wrist only7. The potential acceleration is markedly higher than acceleration factors reported in a recent study by Terada et al. reducing the acquisition time by a factor of 4 from 2:44 minutes to 41 seconds42. However, our results cannot easily be compared to the work of Terada et al., since they used a low-field MR scanner at 0.3 Tesla. A lower field strength generally bears the disadvantage of lower SNR but also allows shorter acquisition times due to shorter T1 relaxation times. Above that, Terada et al. applied an optimized undersampling pattern for their compressed sensing-based approach, which is not commercially available.

The comparison of standard deviations of radiological and automatic analysis methods has to be interpreted carefully, since the minimal deviation that may occur using the GP atlas-matching scheme is 0.5 years, while the automatic method provides a continuous age estimate. Furthermore, contrary to the modern-day reference population of the automatic method, the GP scheme uses a different reference population consisting of Caucasian volunteers born in the 1930’s, which may be considered outdated due to changes in multinational behavior. From a methodological point of view, the difference between the acceptable acceleration factor for an analysis by a radiologist and that for the automatic method could be explained by the fact that the automatic age estimation algorithm analyses the entire 3D data set simultaneously. This avoids influences of single artefacts mimicking a partial closure of the epiphyseal gap in a 2D representation.

The main aim of this study was to test reliability. However, the oldest volunteer (23.15 years) was included to test whether image reconstruction may introduce misleading image features causing an estimation of under 18 years – a legally important age threshold indicating majority age in many countries. Based on the atlas, the maximum age a radiologist can allocate is 19 years old. In the oldest volunteer this maximum age was allocated to image stacks of all acceleration factors. Accordingly, the automatic estimation also provided estimates over 18 y for all acceleration factors, which suggests that the chosen undersampling and reconstruction strategies are robust against misleading artefacts for the simulated acceleration factors.

The validity to use retrospectively undersampled data in this study was shown by a comparison of simulated and acquired images. The simulation of an accelerated acquisition removes data lines that are not acquired during an actual acquisition. Therefore, an agreement between simulated and acquired data can be anticipated. Even more, retrospective undersampling represents a worst-case simulation since the reduced amount of data is extracted from a long acquisition during which more patient movement can occur. For this reason the additional acquisition of accelerated data sets was only performed for a small number of volunteers.

Our work is based on an undersampling scheme readily available on current MR scanners and therefore does not require comprehensive knowledge on undersampling strategies or MR sequence programming. The same applies to the image reconstruction algorithm, which is available to the public in an online repository. This allows for an easy adoption of our proposed method. It is also noteworthy that the automatic method did not require additional training and could readily be applied to undersampled data in its original state. Using the concept of systematically increasing the undersampling of the available data, the feasibility of our approach could already be shown for the relatively small sample size used in this study. We expect to reproduce these results with more data sets, which are currently being acquired.

The potential decrease in acquisition time presented in this study is an important step towards establishing MRI as standard method for age estimation. On successful transfer of this approach to MR acquisitions of third molars and clavicular epiphyses, the application of MRI for multi-factorial age estimation could be promoted even further due to the elimination of the drawback of time consumption. Furthermore, this also translates to a potential reduction of the cost of using this ionizing radiation-free imaging modality for age estimation.

In conclusion, we showed the reliability of image data undersampled with the CAIPIRINHA technique in combination with TGV-based reconstruction for skeletal age estimation. A reduction of the acquisition time to 15 seconds for MR acquisitions of the hand and wrist was found to produce images interpretable using both a radiological and an automatic age estimation method. Furthermore, the high correlation between the two methods shows the potential of automatic methods to support radiologists in age estimation investigations.