Reducing acquisition time for MRI-based forensic age estimation

Radiology-based estimation of a living person’s unknown age has recently attracted increasing attention due to large numbers of undocumented immigrants entering Europe. To avoid the application of X-ray-based imaging techniques, magnetic resonance imaging (MRI) has been suggested as an alternative imaging modality. Unfortunately, MRI requires prolonged acquisition times, which potentially represents an additional stressor for young refugees. To eliminate this shortcoming, we investigated the degree of reduction in acquisition time that still led to reliable age estimates. Two radiologists randomly assessed original images and two sets of retrospectively undersampled data of 15 volunteers (N = 45 data sets) applying an established radiological age estimation method to images of the hand and wrist. Additionally, a neural network-based age estimation method analyzed four sets of further undersampled images from the 15 volunteers (N = 105 data sets). Furthermore, we compared retrospectively undersampled and acquired undersampled data for three volunteers. To assess reliability with increasing degree of undersampling, intra-rater and inter-rater agreement were analyzed computing signed differences and intra-class correlation. While our findings have to be confirmed by a larger prospective study, the results from both radiological and automatic age estimation showed that reliable age estimation was still possible for acquisition times of 15 seconds.

We conduct this feasibility study to investigate the degree of acceleration that can be applied to hand/wrist MRI for age estimation without significantly influencing the estimation outcome. This aims at determining limits and applicability of the proposed method by comparing the reliability of both human and automated evaluation, reflecting the potential of automatic methods to support radiologists in age estimation tasks.

Ethics Statement and Informed Consent.
The study was performed in accordance with the Declaration of Helsinki and was approved by the ethical committee of the Medical University of Graz (EK 21-399 ex 09/10). All volunteers provided written informed consent. From underage participants written consent from a legal guardian was additionally obtained.
Subjects. For this feasibility study 18 healthy male Caucasian volunteers between 13.8 and 23.2 years (mean = 17.2 y, median = 17.0 y) were recruited to acquire three-dimensional MR images of the left hand and wrist. The data of 15 volunteers were used to investigate implications of a reduction in acquisition time on resulting age estimates as described below. The data of the remaining three volunteers (see Table 1) were used to compare retrospectively undersampled images with actually acquired undersampled images. MR Image Acquisition. MRI exams were performed using commercially available clinical 3 T MR scanners (Skyra/Prisma, Siemens Healthineers, Erlangen, Germany) and a conventional 20-channel receive-only head-neck coil (Siemens Healthineers, Erlangen, Germany). Volunteers were placed in prone position with outstretched left arm. The hand was weighted down using a sandbag to minimize movements.
For all 18 subjects T 1 -weighted 3D FLASH (Fast Low Angle SHot) VIBE (Volumetric Interpolated Breath hold Examination) measurements (T E /T R /FA = 4.06 ms/14 ms/15°, field-of-view = 129 mm × 23 0mm, two averages, acquisition matrix = 129 × 230 and image matrix = 288 × 512, 72 slices) of the left hand and wrist were acquired. The resulting 3D volumes had an image resolution of 0.45 mm × 0.45 mm × 0.90 mm and required an acquisition time of t Acqu = 3:46 minutes. For later comparisons with undersampled data, the images from this fully-sampled data are referred to as original images or data.
For three volunteers (see Table 1) accelerated measurements were additionally acquired using CAIPIRINHA with 12 calibration lines and acquisition times of 28, 15 and nine seconds.
For a better understanding, an overview of the study design is given in Fig. 1.

Retrospective Undersampling of MRI Data.
Undersampling MRI raw data is equivalent to not acquiring part of the data, i.e. leaving out data lines during the acquisition. Therefore, retrospectively undersampling conventionally acquired data by removing data lines from the fully-sampled data set prior to image reconstruction is a valid reference method to determine specific acceleration potential. The retrospective undersampling of the raw MR data was applied by simulating the commercially available CAIPIRINHA acquisition strategy with minimized noise amplification 32 . For 15 volunteers, the CAIPIRINHA method with 12 calibration lines was applied retrospectively to simulate six different reduced acquisition times (t Acqu ) between 29 and six seconds (see Table 1) providing a total of 105 data sets. Only non-averaged data were undersampled, which additionally reduced the required acquisition time by a factor of two, compared to the standard setting of performing two averages. In order to reduce the computational burden, the multi-channel data were reduced to a lower number of virtual coils via coil compression 33 . The virtual coil sensitivities were then estimated from the calibration data with the ESPIRiT method 34 . Image reconstruction was carried out for all simulated acceleration factors (AF) using the TGV method 27 , which considers smooth tissue variations and uses a dedicated optimization algorithm 35 adapted for parallel computing. When comparing images reconstructed from retrospectively undersampled data to original images they will be referred to as simulated images or data. For the remaining three volunteers, the undersampling patterns were matched exactly to the pattern of the additionally acquired accelerated measurements, simulating acquisition times of 28, 15 and eight seconds, respectively.
The software for image reconstruction is provided online at https://github.com/IMTtugraz/AVIONIC.

Comparison of Simulated and Acquired Data.
For three volunteers (see Table 1), we compared acquired undersampled images with the corresponding simulated images. A comparison of changes of specific image features with increasing undersampling factor in both acquired and simulated data serves the purpose of showing the validity of using retrospectively undersampled data for this study.
Skeletal Rating. Skeletal age was rated independently using two different methods. A radiologist with more than five years of expertise in forensic applications (R1) and a pediatric radiologist with five years of experience in bone age estimation (R2) independently evaluated whether the quality of the simulated images was adequate for reproducible radiological age estimation. For MRI-based radiological age estimation, radiologists applied the method proposed by Greulich and Pyle 36 (GP) to the MR images evaluated as assessable. The GP method, originally developed for age estimation based on radiographs, was verified to be applicable for age estimation from MR images, reporting errors on the same scale as inter-rater variations 37 . To avoid biased age estimates the MR images were anonymized and randomized irrespective of the acceleration factor. To estimate general limits of radiological assessability, an initial analysis was performed after acquisitions of the first five volunteers. The acquired MR data were undersampled according to the values in Table 1. A radiological evaluation rated four out of five data sets with acquisition times below 15 seconds as unusable for a non-ambiguous radiological age estimation. Therefore, for radiological evaluation only original data and simulated image stacks with acquisition times of 29 and 15 seconds -a total of 45 data sets -were presented to radiologists R1 and R2 for age estimation.
The second skeletal age rating was performed using the fully automated age estimation method proposed by Urschler et al. 24 extended by improving landmark localization accuracy 38 and introducing a novel deep neural network based age estimator 39 . This setup was used solely as an age predictor, i.e. without using data from the present study to further train the model or tune its parameters.
Statistical Analysis. The main focus of this study was on the reliability of age estimation with decreasing acquisition time and not on the actual absolute results of age estimation. Therefore, we analyzed the difference introduced into the estimated age with decreasing acquisition time to assess reliability. For this analysis the reference age for comparison was the age estimated by each observer using the original images. The difference was then calculated by subtracting the age estimated from original data from the age estimated from simulated data for results of radiologist R1 and R2 ( Age R1 ∆ , ∆Age R2 ) and the automatic age estimation (∆Age autom ): Figure 1. Schematic illustration of the applied method to investigate the reliability of age estimation based on undersampled data. Both original images and images reconstructed from undersampled data (AF: acceleration factor describing speed-up of acquisition time) are used for age estimation applying radiological and automatic estimation methods, respectively. Finally, the differences in the estimates are evaluated. Additionally, simulated data is compared to actually acquired data to show the validity of using retrospectively undersampled data. The standard deviation of the signed differences (SSD) of Age ∆ was used as a measure for the reliability of the age estimation, the mean of signed differences (MSD) to identify potential systematic errors. Additionally, the intra-class correlation coefficient ICC was calculated between the age estimates based on original images and the estimates from simulated data sets.
The inter-rater reproducibility between all three observers, i.e. R1, R2 and the automatic age estimation method (A), was determined by calculating ICC and Bland-Altman limits of agreement (LOA) between corresponding age estimates. The inter-rater reproducibility between the radiological estimation and the automatic method thereby provides a measure of conformity between the two different age estimation methods. This information may help to evaluate the potential to combine them to a hybrid between manual and fully automatic age estimation similar to an approach recently proposed for volumetry in oncology 40 .
Data Availability. The acquired MRI data sets generated and/or analyzed during the current study are not publicly available for data privacy reasons. The participants did not explicitly give their consent to freely distribute their imaging data, albeit anonymized. However, quantitative measures derived from the imaging data will be made available as a supplementary to this publication.
Ethical approval. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Results
Image Reconstruction and Image Quality. Figure 2 shows representative images of a central slice of the left hand and wrist of one volunteer (14.2 y) for the original data set and simulated acquisition times of 29, 15 and six seconds. Qualitatively, for an acquisition time of at least 15 seconds no severe artefacts can be identified; however, images with an acquisition time of 15 seconds already feature image blurring, which increases with the acceleration factor. For an acquisition time of six seconds, differences between original and simulated data become clearly visible. Additionally, the difference between original and simulated images is shown for an image profile line covering bone, muscle tissue and joint cartilage. Deviations from the original image increase with the acceleration factor and become pronounced for larger muscle regions. In general, the reduction of available data leads to blurring and loss of morphological details in the resulting images. This blurring is observable for example as overlap of the muscle tissue with metacarpal bones or the broadening of the joint cartilage of the fifth digit (first visible for t Acqu = 15 s) producing positive peaks in the difference of the profile lines.
Assessability of Simulated MR Images. All images with acquisition times of 29 and 15 seconds were rated as suitable for age estimation by both radiologists. The automatic age estimation provided age estimates for all data sets. Figure 3 visualizes the influence of the reduction of acquisition time on age estimation by showing the difference to the age estimates based on the original data for radiologists R1 and R2 and the automatic method (the values for all age estimates can be found in Supplementary Table S1 online). For the radiological evaluation, standard deviations of signed differences (SSD) introduced by simulated  Table 2.

Variability and Reliability of Rating.
The values for the ICC in Table 2 show high intra-class correlation for both applied age estimation methods. A comparison to original age estimates yields a minimum ICC of 0.96 for all evaluated data sets; the values for inter-rater variability lay between 0.91 and 0.99. All results were highly significant with p < 0.000001 for all values. The Bland-Altman plots in Fig. 4 show high inter-rater agreement. The mean values of the Bland-Altman analysis lie between 0.03 and 0.33 years and suggest no systematic error in the analysis. Radiological raters R1 and R2 show the best agreement with LOA = 1.02 y, testing the agreement between radiological and automatic method yields LOA = 1.5 y for R1 and LOA = 1.14 y for R2.
The automatic method as well as both radiologists estimated the oldest volunteer (23.2 y) to be over 18 y for all acceleration factors. The evaluation by the radiologists yielded 19 y for all acceleration factors for this volunteer -the maximally assessable age -and the automatic estimation provided estimates between 18.3 y and 18.9 y.

Discussion
The presented results suggest that a radiological analysis can provide reliable age estimates based on hand/wrist MRI using an acquisition time of only 15 seconds, which corresponds to an acceleration factor of approximately 7.5 compared to the original acquisition time of 3:46 minutes. For this duration no relevant artefacts occurred in the simulated images and all data sets were deemed assessable and yielded a maximum SSD of 0.55 years (shown in Fig. 3). This is in the range of reported errors for the radiological examination of skeletal development 41 . With decreasing acquisition time, automatic age estimation showed an increasing deviation compared to the estimation from the original data set. However, for a simulated duration of six seconds the standard deviation was still only  Table 2).  Table 2. Comparison between ratings of radiological and automatic age estimation: reliability of age estimates is reported as correlation with estimates based on fully-sampled data sets. ICC: Intra-class correlation coefficient, SSD/MSD: Standard deviation/mean of signed differences. 0.51 years (see Table 2). Both estimation methods yielded small MSD values suggesting that age estimates are not influenced by a systematic offset.
With increasing acceleration factor, images reconstructed from undersampled data tend to appear blurry while noise is suppressed and fine structures become less distinctive, creating an unusual image representation for radiologists. The quality of the simulated images allowed a radiological analysis for acquisition times down to 15 seconds and age estimates for the analyzed data sets were close to identical. The automatic method provided reliable results even for the shortest acquisition time of six seconds. This is a remarkable reduction of the acquisition time as existing age estimation studies at a field strength of 3 Tesla can require acquisition times of up to six minutes for the wrist only 7 . The potential acceleration is markedly higher than acceleration factors reported in a recent study by Terada et al. reducing the acquisition time by a factor of 4 from 2:44 minutes to 41 seconds 42 . However, our results cannot easily be compared to the work of Terada et al., since they used a low-field MR scanner at 0.3 Tesla. A lower field strength generally bears the disadvantage of lower SNR but also allows shorter acquisition times due to shorter T 1 relaxation times. Above that, Terada et al. applied an optimized undersampling pattern for their compressed sensing-based approach, which is not commercially available.
The comparison of standard deviations of radiological and automatic analysis methods has to be interpreted carefully, since the minimal deviation that may occur using the GP atlas-matching scheme is 0.5 years, while the automatic method provides a continuous age estimate. Furthermore, contrary to the modern-day reference population of the automatic method, the GP scheme uses a different reference population consisting of Caucasian volunteers born in the 1930's, which may be considered outdated due to changes in multinational behavior. From a methodological point of view, the difference between the acceptable acceleration factor for an analysis by a radiologist and that for the automatic method could be explained by the fact that the automatic age estimation algorithm analyses the entire 3D data set simultaneously. This avoids influences of single artefacts mimicking a partial closure of the epiphyseal gap in a 2D representation.
The main aim of this study was to test reliability. However, the oldest volunteer (23.15 years) was included to test whether image reconstruction may introduce misleading image features causing an estimation of under 18 years -a legally important age threshold indicating majority age in many countries. Based on the atlas, the maximum age a radiologist can allocate is 19 years old. In the oldest volunteer this maximum age was allocated to image stacks of all acceleration factors. Accordingly, the automatic estimation also provided estimates over 18 y for all acceleration factors, which suggests that the chosen undersampling and reconstruction strategies are robust against misleading artefacts for the simulated acceleration factors.
The validity to use retrospectively undersampled data in this study was shown by a comparison of simulated and acquired images. The simulation of an accelerated acquisition removes data lines that are not acquired during an actual acquisition. Therefore, an agreement between simulated and acquired data can be anticipated. Even more, retrospective undersampling represents a worst-case simulation since the reduced amount of data is extracted from a long acquisition during which more patient movement can occur. For this reason the additional acquisition of accelerated data sets was only performed for a small number of volunteers.
Our work is based on an undersampling scheme readily available on current MR scanners and therefore does not require comprehensive knowledge on undersampling strategies or MR sequence programming. The same applies to the image reconstruction algorithm, which is available to the public in an online repository. This allows for an easy adoption of our proposed method. It is also noteworthy that the automatic method did not require additional training and could readily be applied to undersampled data in its original state. Using the concept of systematically increasing the undersampling of the available data, the feasibility of our approach could already be shown for the relatively small sample size used in this study. We expect to reproduce these results with more data sets, which are currently being acquired.
The potential decrease in acquisition time presented in this study is an important step towards establishing MRI as standard method for age estimation. On successful transfer of this approach to MR acquisitions of third molars and clavicular epiphyses, the application of MRI for multi-factorial age estimation could be promoted . Bland-Altman plots for inter-rater agreement. Agreement is shown between (a) R1 and R2, (b) R1 and the automatic method (A) and (c) R2 and the automatic method as a function of the acquisition time. µ R1,R2 , µ R1,A and µ R2,A , describe the mean value of the age estimates of the respective raters, Δ is the difference between the respective ratings.
even further due to the elimination of the drawback of time consumption. Furthermore, this also translates to a potential reduction of the cost of using this ionizing radiation-free imaging modality for age estimation.
In conclusion, we showed the reliability of image data undersampled with the CAIPIRINHA technique in combination with TGV-based reconstruction for skeletal age estimation. A reduction of the acquisition time to 15 seconds for MR acquisitions of the hand and wrist was found to produce images interpretable using both a radiological and an automatic age estimation method. Furthermore, the high correlation between the two methods shows the potential of automatic methods to support radiologists in age estimation investigations.