Introduction

Osteoporosis, through its association with age-related fractures, is one of the most common causes of longstanding pain, functional impairment, disability, and death in elderly populations, and a major contributor to medical care costs worldwide1,2. Hip fracture, in particular, is a serious life-threatening injury, with fracture of the proximal femoral neck, intertrochanteric and/or shaft regions typically occurring from a sideways fall from standing height. Mortality after hip fracture is high (~ 10%) in the immediate post-fracture period, and remains higher than that of the general population1,3,4,5. For these reasons, effective intervention strategies to reduce the risk of hip fracture at both individual and population levels are warranted.

It is well established that exercise training is beneficial for improving bone strength (i.e., bone’s load carrying capacity or failure load), particularly when starting during early adolescence6. With exercise training, bone strength may be maintained (or increased) and hip fracture risk may be reduced in old age. In order to identify specific exercise protocols which reduce hip fracture risk, non-invasive in vivo estimates of proximal femoral strength during adolescence and early adulthood are required.

Dual energy X-ray absorptiometry (DXA) is a two-dimensional (2D) imaging technique offering measures of areal bone mineral density (aBMD) of the proximal femur. The technique is low dose (0.14 µSv7), and thus is suitable for adolescents and young adults8. DXA-based aBMD measures of the proximal femur offer modest-to-strong agreement with experimentally-derived failure load (fall configuration: R2 ranging from 0.41 to 0.929,10,11,12; stance configuration: R2 ranging from 0.42 to 0.7113,14) (for a detailed overview, see summary table in15). DXA though offers representations of complex 3D structures as 2D projection images, and thus cannot distinguish between cortical or trabecular bone geometry or material properties, each of which independently contribute to proximal femoral strength16. Quantitative computed tomography (QCT) is a three-dimensional (3D) imaging technique offering measures of volumetric BMD (BMD) of both cortical and trabecular bone. On its own, QCT measures of proximal femoral geometry and density offers modest predictions of failure load (fall: R2 = 0.199; stance: R2 = 0.6617). However, when combined with computational finite element (FE) modelling (a method referred to as QCT-FE), the approach offers stronger agreement with experimentally-derived failure load (fall: R2 ranging from 0.73 to 0.9012,18,19,20,21; stance: R2 ranging from 0.63 to 0.9517,19,20,21,22). QCT, however, exposes participants to higher levels of ionizing radiation at the radiosensitive pelvic region (e.g., 2900 µSv from Khoo et al.23), which some may argue is ethically unacceptable for growing adolescents and fertile young adults. Accordingly, the QCT-FE technique is typically applied with elderly adult populations. Recently, FE combined with magnetic resonance (MR) imaging (referred to as MR-FE) has seen application for identifying failure regions as well as assessing hip strength of exercise groups engaging in different levels of physical activity (high-impact, odd-impact, repetitive-impact, high-magnitude, non-impact)24,25,26. The key benefits of MR is that it offers multi-planar 3D images and nonionizing radiation of the radiosensitive pelvis (and thus has potential for studying adolescents and young adults). Current research suggests that MR-FE is an accurate tool for estimating mechanical failure loads of the proximal femur with strong agreement with experimentally obtained values (fall: R2 = 0.85)27. To date, there has only been one study which assessed the in vivo precision error of MR-FE; however, this study focused on whole-bone stiffness and elastic modulus for a small region of interest (ROI)28. Currently, the measurement repeatability of MR-FE mechanical outcomes (specifically bone stress and failure load) has not been reported at critical failure regions for fall and stance loading configurations.

Knowledge of the measurement error is important to establish the repeatability of the technique. Specifically, an understanding of the precision error is critical as it identifies parameters which may be best suited for future research related to MR-FE. Relatedly, knowledge of precision error can be used to determine the least significant change (LSC). The International Society of Clinical Densitometry recommends estimating the LSC to determine if observed skeletal differences are true and greater, with 95% confidence, than the measurement error29. LSC is estimated using the root-mean squared coefficient of variation (RMS-CV%) multiplied by an adjusting z-score (2.77 × RMS-CV% for 95% confidence) and is an important quantitative metric to ensure changes are sufficiently larger than the precision error30,31. LSC is suitably important for clinical studies and comparing bone strength differences. To date, LSCs have not been reported for MR-FE derived mechanical outcomes.

The objective of this study was to characterize the in vivo measurement precision of MR-FE mechanical outcomes of the proximal femur (bone stress and failure load, specifically) for configurations simulating fall and stance loading.

Methods

Participants

Thirteen healthy participants (5 males and 8 females) with ages ranging from 21 to 68 years (median age: 27 years), and weights ranging from 54 to 105 kg (median: 70 kg), were recruited as part of a previous study at the University of Saskatchewan32. Participant information is presented in Table 1. Study approval was obtained from the University of Saskatchewan Biomedical Research Ethics Board. All study procedures were conducted in accordance with the guidelines approved by the Biomedical Research Ethics Board and the Declaration of Helsinki. Informed consent was obtained from all study participants.

Table 1 Participant characteristics.

MRI scan parameters

MRI scans of the left proximal femur were obtained from a previous research study32. Axial images (relative to the orientation of the participant) of the hip were obtained using a clinical 1.5 T scanner (Magnetom Avanto, Siemens, Germany) with a 6-channel body array coil positioned over the hip region. Each participant was positioned supine with their left leg extended and externally rotated 15˚. Scanned image volumes included ~ 2 cm superior to the femoral head and concluded ~ 5 cm inferior to the lesser trochanter. A T1-weighted turbo spin echo sequence was used with the following parameters: TR 616 ms, TE 12 ms, 2 excitations, 180˚ flip angle, 0.45 × 0.45 mm in plan pixel size, 4 mm slice thickness, ~ 4.5 min scan time, ~ 40 images. Each participant was scanned three times with repositioning done following a short walk between repeat scans.

Image analysis

Intensity shading inhomogeneity, commonly known as “bias field”, was present in the original MRI scans33. An open-source software platform for medical imaging (3D Slicer) was used in conjunction with a non-parametric, non-uniform intensity normalization module (N4ITK) to interactively correct the image inhomogeneity34,35. Each original scan of the proximal femur was individually loaded and processed using the correction module. Images were then qualitatively checked for shading improvement.

Using commercial software (Analyze 12.0: Mayo Foundation, Rochester, MN, USA), MRI scans were semi-automatically segmented to delineate the proximal femur from surrounding soft tissue. Each image slice was segmented in the transverse plane followed by manual correction. Subject-specific thresholds (defined via the half-maximum height, HMH) method approach were used to define the periosteal boundary and separate it from the soft tissue36,37. The thresholds were defined at a site approximately 2 cm below the lesser trochanter on the femoral shaft32. All segmentations were performed by a single researcher (K.B.M.). The original discrete MRI scans and segmentations were reformatted via cubic interpolation to create isotropic cubic arrays (from 0.45 × 0.45 × 4 mm to 0.45 × 0.45 × 0.45 mm). Following interpolation, binary masks were adjusted in the coronal plane to reduce delineation precision errors caused by participant repositioning between scans.

Image volumes (scans and masks) were aligned into fall and stance loading orientations using custom coding (Matlab 2018a; MathWorks, Natick, MA, USA), as per previous proximal femoral FE studies26,38. Using mask data, this process involved identifying the center of the femoral head by fitting a sphere to the surface of the head via a variant of the iterative closest point algorithm39. The long axis of the femur (aka shaft axis) was defined by identifying the line-of-best-fit through centroids of axial slices distal to the greater trochanter. A plane was then fit to the shaft axis and the center of the femoral head. A vector corresponding with the neck was also defined by identifying the line-of-best-fit through centroids of slices in an axial-oblique orientation. This vector was then projected to the plane containing the shaft axis and center of the femoral head. The neck axis was defined as the projected vector passing through the femoral head and intersecting with the shaft axis. This configuration was used to define the common 0° orientation with the shaft axis aligned vertically and the neck axis aligned with 0° internal/external rotation (Fig. 1). From here the images were rotated to the stance configuration (shaft long axis rotated 20° from vertical38) and fall configuration (shaft long axis tilted 10° with respect the ground with the neck axis internally rotated 15°26) (Fig. 2).

Figure 1
figure 1

MRI scans were aligned into a common 0° orientation (shown) and then rotated into fall and stance configurations prior to FE model generation. Using the segmented mask data, the long axis of the femur (aka shaft axis) (a) was defined by identifying the line-of-best-fit through centroids of axial slices distal to the greater trochanter. The center of the femoral head (b) was identified by fitting a sphere to the surface of the head via a variant of the iterative closest point algorithm. A vector corresponding with the neck was also defined by identifying the line-of-best-fit through centroids of slices in an axial-oblique orientation. This vector was then projected to a plane containing the shaft axis and center of the femoral head. The neck axis (c) was defined as the projected vector passing through the femoral head and intersecting with the shaft axis. This configuration was used to define the common 0° orientation with the shaft axis aligned vertically and the neck axis aligned with 0° internal/external rotation.

Figure 2
figure 2

Stance and fall loading configurations of the FE models. The shaft long axis was rotated 20° from the vertical and an initial distributed load applied over the femoral head for the stance models (a). For the fall configuration, the femoral shaft was tilted 10° with respect to the ground (b) and the neck axis was internally rotated 15° (c). The distal shaft was constrained with a hinge-type boundary condition (prohibiting displacements but allowing rotations), and the greater trochanter nodes were restrained in the direction of the distributed load.

FE modelling

FE models representative of stance and sideways fall loading configurations were generated from the realigned MRI volumes and segmentations. Using custom algorithms (Matlab), we converted each voxel into an 8-noded hexahedral element with dimensions corresponding to the 0.45 mm voxel size. Bone material properties were assumed to be linearly elastic and isotropic, with the elastic moduli of each voxel computed from the image intensity. Voxel-specific bone volume fraction’s (BVF) were computed from the image intensity via BVF = 1 − (Intvoxel/Intmax), as per40. A custom MRI phantom was used to verify that a linear relationship exists between image intensity and BVF (R2 > 0.99) (Supplementary Material). Imaged BVF was converted to elastic moduli (E) via the equation E = 12.9[1.08(1-Intvoxel/Intmax)]2, where Intvoxel is the intensity of each voxel and Intmax is the maximum fat intensity in the scan. This equation was based upon Öhman et al.41 density-modulus equation for the proximal femur, combined with conversion equations linking BVF, apparent density and ash density42,43. A Poisson’s ratio of 0.3 was assumed for all elements44.

Nodal connectivity and material properties of the proximal femur were imported into Abaqus (version 6.13, Providence, RI, USA) for loading and analysis (Fig. 2). For the loading configurations, we applied a distributed load over the femoral head. The distal shaft was fully constrained for the stance models as in previous studies20,21,38. For the sideways fall, a hinge-type boundary condition was applied on the distal shaft, and the most lateral nodes of the greater trochanter were fully constrained in the direction of the force21,26,45. For both the stance and sideways fall configurations, an arbitrary load of 1 body weight was applied (arbitrary in that the linearity of the models allowed for the results to be scaled).

FE outcomes

The FE outcomes were analyzed at 4.5 mm thick anatomical regions of interest (Fig. 3) at the neck, intertrochanteric, and shaft. The regions were selected based on common critical failure regions and automatically defined using anatomical landmarks and custom coding (Matlab)38,45. For each region and orientation, the mean von Mises stress, von Mises strain, principal stresses, and principal strains were calculated. The principal stresses and strains were used to derive failure loads from four different failure criteria, including the von Mises yield, brittle Coulomb-Mohr (BCM), normal principal, and Hoffman criteria stress and strain analogs19,20,46,47,48. Failure theories were assessed at the three regions of interest for each configuration. The applied force was linearly scaled to determine the failure load which would cause 5% of contiguous elements to fail.

Figure 3
figure 3

FE outcomes were reported at 4.5 mm thick regions at the femoral neck (center of the femoral neck axis between the head center and vertical shaft axis), intertrochanteric (bi-sector of the angle between the neck and shaft), and shaft (20 mm below the inferior edge of the lesser trochanter).

Strain and equivalent stress limits were used for cortical and trabecular bone. We assigned bone a tensile strain limit of 7000 μstrain49,50 and a compressive strain limit of 10,000 μstrain41. The equivalent stress limits were assigned by multiplying the strain limits by the respective element’s elastic modulus46. The tensile and compressive strain limits (εyt, εyc), and stress limits (σyt, σyc) were related using the ratios εyt/εyc and σyt/σyc, being equal to 0.720,51.

Statistical analysis

We assessed short-term in vivo precision errors of each outcome using RMS-CV% (short-term refers to the case where measurements are acquired over a time period of less than 1 month, as per Bonnick et al.31)52. With 13 participants scanned 3 times, this provided 26 degrees-of-freedom (DOF = # participants * (# scans–1)), which met recommendations by Glüer et al.52. With this DOF, we established a precision error with an upper 90% confidence limit less than  ~ 30%. We report mean values for each outcome. Short-term precision was also assessed in absolute terms using the root mean square standard deviation (RMS-SD) of the 3 repeat measures.

Results

Regional means

For the fall configuration, RMS-CV% precision errors of the regional unadjusted stress and strain measures averaged 7.9% and ranged from 5.3% to 11.7% (Table 2). For the stance configuration, RMS-CV% precision errors of the regional stress and strain measures averaged 7.8% and ranged from 3.3% to 11.8%. RMS-CV% for the strain measures ranged from 7.0% to 11.8%, and 3.3% to 7.9% for the stress measures. Regional stress/strain precision errors appeared similar between the femoral neck, intertrochanteric, and shaft regions.

Table 2 Precision results for the MR-FE mechanical outcomes for the fall and stance loading configuration (13 participants, 3 scans each, 26 degrees of freedom).

Failure loads

RMS-CV% precision errors for failure loads in the fall configuration averaged 7.5% and ranged from 5.8% to 9.0% (Table 3). RMS-CV% precision errors of failure loads for the stance configuration averaged 7.3% and ranged from 6.4% to 8.1%. Failure load precision errors were  < 8.2% at the femoral neck,  < 9.0% at intertrochanteric region, and  < 8.3% at the shaft (Table 3).

Table 3 Precision results for the MR-FE failure loads for the fall and stance loading configuration (13 participants, 3 scans each, 26 degrees of freedom).

Discussion

This study characterized short-term in vivo precision errors of MR-FE outcomes of the proximal femur for two loading configurations and three regions. To our knowledge, this is the first study to report FE precision errors at the neck, intertrochanteric, and shaft regions using MR-FE. This study complements existing studies which focused on evaluating differences in MR-FE outcomes between groups and provides indication of measurement error.

Generally, the von Mises stress, principal stresses, principal strains, and failure loads had similar precision errors (RMS-CV% < 8.3%), except for the von Mises strain criterion which was higher (RMS-CV% < 11.8%). The high measurement error of the von Mises strain outcomes may be attributed to the small strain values, whereby a small variation resulted in a large precision error. Our FE-based in vivo precision error results are similar (though slightly higher) to previous QCT-FE findings at the knee, which had an average RMS-CV% of  < 6%53. Additionally, MR-FE precision errors for the two configurations are comparable with no substantial differences. In comparison to an MR precision study of bone morphology (e.g., cortical thickness)32, which used the same scan data evaluated here, reported precision errors were smaller (< 7.1%) than the errors reported here. Though, our study considered FE outcomes of 3D volumetric ROI’s whereas Johnston et al.32 reported metrics based on single 2D image slices.

To sufficiently recommend a best-suited failure criterion for future MR-FE studies, various parameters including precision error (RMS-CV%), explained variance (R2), and ability to capture changes or differences are needed for consideration. With regards to the presented precision errors, the four failure theories assessed in this study were similar and provided measurement errors  ≤ 9.0%. Though, a large range of estimated failure loads may indicate a more sensitive criterion for identifying differences in bone strength for MR-FE. In this case, BCM (stress and strain) generally had the largest failure load ranges. In line with this finding, and comparable measurement error with other failure criteria, BCM may best characterize hip strength. Future research is needed to evaluate experimentally-derived failure loads against MR-FE derived estimates acquired via various failure theories to identify the best-suited criterion.

Numerical failure load results from this study are similar to those published in previous research25. The estimated failure loads from our study, focused on a young adult population, ranged from 3.0 to 16.4 kN at the neck in the fall configuration. Previous experimental studies found failure loads ranging from 5.2 kN to 8.5 kN for the same site and configurations45,54,55; though, these findings were specific to elderly adult (> 70 years of age) cadaveric femurs. As adult femurs are approximately twice as strong as elderly adult femurs54, our results may be comparable. Our failure load findings though are specific to the applied criteria (e.g., 5% of elements failing). A lower percentage of failed elements would lead to lower failure loads approaching experimental findings. Accordingly, further validation research is needed identifying specific modelling approaches (e.g., failure criterion, percentage of failed elements) best-suited for predicting failure of the proximal femur. Of note, stress and strain outcomes presented in this paper are presented for measurement repeatability only. The applied force magnitude of 1 body weight was arbitrary and lower than estimated failure loads. The lower applied load can explain lower stress values (Table 2, Fig. 4) in comparison to other MR-FE research (e.g., Abe et al.26 used an impact force ~ 8 × body weight).

Figure 4
figure 4

Example of the internal von Mises stress distribution under an applied load of 1 body weight for the stance (a) and sideways fall (b) loading configurations.

This research has strengths requiring consideration. First, with MR-FE, each voxel of the proximal femur was modeled as a hexahedral element, allowing us to preserve the cortical detail from the scans. Conversely, using tetrahedral elements requires intensive surface smoothing and careful strategies to map elastic moduli to elements. The surface smoothing process inherently incorporates voxels inside and/or outside the original image mask, which may lead to loss of femoral detail. Secondly, we applied a custom algorithm to automatically align MR scans into the fall and stance loading configurations, which reduced variation between repeat scans, leading to a lower precision error. Third, we report precision errors at three clinically relevant regions56,57 for the two commonly studied loading configurations in the literature. The inclusion of different regions and loading configurations provides information of regional precision. Fourth, we have used a conservative sample size (13 participants, 39 scans, 26 DOF) to establish precision errors with an upper 90% confidence interval limit of  ~ 30%, as proposed by Glüer et al.52. Although our study did not exactly meet the DOF recommendations (28 DOF), the upper 90% confidence limit with our DOF (31%) is comparable to recommendations (30%).

With regards to limitations, first, due to the large slice thickness (4 mm), the true 3D geometry of the femur was difficult to capture and resulted in a jagged structure. The large slice thickness may have resulted in under/over estimation of bone strength as critical bone features may not have been captured in the original scans. To more accurately characterize the shape of the proximal femur, our original scans consisting of 37 slices were interpolated to 329 slices. This approach led to a more correct shape, but small variations in material properties were not truly captured. Second, due to the poor signal-to-noise ratios on some scans, it was difficult to identify the periosteal surface within the intertrochanteric region. To segment, we defined the boundary using semi-automatic region growing and subject-specific thresholds (HMH)37, followed by manual segmentation where needed. Operator judgment had an influence on femoral segmentations and may have induced error. Third, presented MR-FE models of the proximal femur were not validated against mechanical testing, unlike previous QCT-FE studies12,17,18,19,20,21,22. To address this, we adopted similar boundary and loading conditions as previous studies and compared our numerical results25,26,38. However, it would be beneficial to validate MR-FE derived estimates of bone failure load, along with corresponding failure criteria, reported here. Fourth, our study assessed the short-term precision errors of relatively young adults (median age: 27 years), making it difficult to generalize our results beyond the studied age group. Still, our study provides insight into MR-FE measurement precision and supports the application of MR-FE for monitoring bone strength differences. Fifth, in this study we applied short-term precision errors to estimate LSC. Glüer et al.30 though advises to use long-term precision errors (i.e., measures taken over at least 1 year) in the LSC calculation to account for factors such as scanner calibration, drift and differences in operator technique. Unfortunately (and in line with Bonnick et al.31), we found that the logistical difficulties in performing a long-term precision study, compounded with the need to apply linear regression to account for biological changes due to growth and development, made the approach unfeasible. Accordingly, it is important to be cognizant that the LSC presented here may be underestimated.

In conclusion, this study found that short-term precision errors were less than 11.8% for the two loading configurations. Precision errors ranged from 3.3% to 11.8% for regional stress and strain mean outcomes, and 5.8% to 9.0% for failure loads. This is the first study to assess the short-term in vivo precision error of MR-FE outcomes for fall and stance loading configurations at the proximal femur. Results from this study demonstrate that MR-FE outcomes are a promising non-invasive technique for monitoring femoral strength in vivo and may guide future studies in their assessment of femoral strength.