Intra and Inter-rater Reliability between Ultrasound Imaging and Caliper Measures to determine Spring Ligament Dimensions in Cadavers

The purpose was to evaluate intra and inter-rater reliability, repeatability and absolute accuracy between ultrasound imaging (US) and caliper measures to determine Spring ligament (SL) dimensions in cadavers. SLs were identified from 62 human feet from formaldehyde-embalmed cadavers. Intra and inter-observer reliability, repeatability and absolute accuracy of SL width, thickness and length between US and caliper measurements were determined at intra and inter-session by intraclass correlation coefficients, Pearson´s correlation coefficients, Student t tests, standard errors of measurement, minimum detectable changes, values of normality, 95% limits of agreement, and Bland-Altman plots. Excellent inter-session and inter-rater reliability, adequate absolute accuracy, almost perfect agreement and strong correlations were shown for caliper, US and their comparison for all SL dimensions. US measurements presented higher absolute accuracy than caliper measures for SL length and thickness dimensions, while caliper displayed greater absolute accuracy for SL width dimensions. Good repeatability (P > 0.05) was shown for all SL dimensions by US, caliper and their comparison, except for SL width dimension measured with US (P = 0.019). Both US and caliper could be recommended for all SL dimensions evaluation due to their excellent reliability and absolute accuracy in cadavers, although width dimensions should be considered with caution due to US repeatability differences.

In addition, all methods were performed in accordance with the relevant guidelines and regulations. Consent for this study was previously obtained from the anatomy department. cadavers and embalming method. Sixty-two feet from formaldehyde-embalmed human cadavers, 8 males and 26 females, without presence of any type of trauma were recruited in our research protocol 9 . The mean (SD) age was 76.46 (6.46) years; range from 66 to 89 years old. The human cadaveric feet comprised 30 right and 32 left feet. The adult cadavers came from the Scientific Anatomy Center, S.L. in the town of Valencia (Spain) in the town of Valencia (Spain). Scientific Anatomy Center, S.L. which included informed consent as part of the cadaver donation process.
The used preservation method for embalming the cadavers was perfusion through the femoral artery with a blending of formaldehyde, ethanol, methanol, phenol and glycerine that improve the longevity of the body and tissues, reducing the infection risk 16 .

Ultrasound measurements. US images were recorded by a Mindray Z6 Digital Ultrasonic Diagnostic
System (Shenzhen Mindray Bio-Medical Electronics Co., Ltd, Shenzhen, China) by using a linear transducer type L4-P with a frequency bandwidth range of 5-10 MHz.
All human cadaver feet were located at the same immobilized neutral position. Then, two independent and experienced musculoskeletal podiatrists (with at least 5 years of musculoskeletal US experience) collected the US measurements to determine the width, thicknesses and length (cm) of the SL in cadaveric feet (Fig. 1). caliper measurements. Thereafter, the foot cadaver dissection was carried out in order to expose the SL for its measurement with a digital LCD caliper (BURG-WÄCHTER KG, Wetter, Germany) with the subtalar joint foot in neutral position. Two podiatrists recorded the width, thicknesses and length (cm) of the SL in cadaveric feet with this device (Fig. 2).
Reliability study protocol. After two days, the protocol design was repeated identical to the first session of measure. The values of the measurements from 1 st and 2 nd sessions as well as 1 st and 2 nd observers were used to analyze the intra and inter-rater reliability at intra and inter-session. The podiatrists did not have access to the information records of the 1 st session until recorded values were registered after the 2 nd session.

Statistical analyses.
Statistical analyses were carried out by the statistical package of SPSS 19.0 software for windows (SPSS Inc., Chicago, USA). First, the Kolmogorov-Smirnov test was used to assess normality. All variables were parametric data due to a normal distribution was shown (according to a P-value > 0.05 of the Kolmogorov-Smirnov test). Second, mean ± standard deviation (SD) as well as upper and lower limits for 95% confidence interval (CI) were used in order to describe all data. Finally, differences between two measurement values were analyzed by the Student t test for paired samples.
The 95% limits of agreement (LoA) between sessions and devices expressed the degree of error proportional to the mean of the measurement units, and these statistics were calculated using the methods described by Bland and Altman 11 . If the differences between the measurements tended to agree, the results were close to zero.
Standard errors of measurement (SEM) were calculated to measure the range of error of each parameter. The SEM was calculated from the ICCs and SDs for each of the three measurements. SEM were calculated according www.nature.com/scientificreports www.nature.com/scientificreports/ to the formula SEM = SD × sqrt (1 − ICC). Indeed, the minimum detectable change (MDC) was calculated from the SEM values by the following formula MDC = × . × SEM 2 1 96 at a 95% CI which reflected the magnitude of change necessary to provide confidence to be sure about these changes were not the result of random variations or measurement errors. Both SEM and MDC were analyzed according to Bland and Altman 12 . Furthermore, values of normality (VN) of the sample for all outcome measurements were obtained by the formula VN = Mean + /_1.96 * SD.
Finally, Bland-Altman plots 11,12 were calculated to display the agreement between US and caliper. These plots showed the difference between each pair of measurements on the y-axis against the mean of each pair of measurements on the x-axis. A P-value < 0.05 with a 95% CI was used for the data analysis.
Analysis of reliability of the SL dimensions by first observer between US and caliper measurements (Table 5) showed excellent intra-rater reliability (ICC (1-1) = 0.877-0.978) with a strong correlation (r = 0.805-0.957; P < 0.001) for length and thickness measurements. Nevertheless, poor intra-rater reliability (ICC (1-1) = 0.207) with a weak non-significant correlation (r = 0.127; P > 0.05) was shown for width measurements. In addition, there were inter-session statistically significant differences (P < 0.05) between US and caliper measurements for thickness and width, but not for length measurements (P > 0.05).
Analysis of reliability of the SL dimensions by second observer between US and caliper measurements ( Table 6) showed excellent intra-rater reliability (ICC (1-1) = 0.862-0.996) with a strong correlation (r = 0.781-0.993; P < 0.001) for length and thickness measurements. Nevertheless, poor intra-rater reliability (ICC (1-1) = 0.232) with a weak non-significant correlation (r = −0.104; P > 0.05) was shown for width measurements. In addition, there were inter-session statistically significant differences (P < 0.05) between US and caliper measurements for thickness, but not for length and width measurements (P > 0.05).
Analysis of reliability of the SL dimensions by US between inter-session first and second observer (Table 7) showed excellent inter-rater reliability (ICC (1-1) = 0.938-0.994) with a strong correlation (r = 0.893-0.989; P < 0.001) for length, thickness and width measurements. Nevertheless, there were inter-rater statistically significant differences (P < 0.05) between first and second observer for width measurements, but not for length and thickness measurements (P > 0.05).
Analysis of reliability of the SL dimensions by caliper between inter-session first and second observer ( Table 8) showed excellent inter-rater reliability (ICC (1-1) = 0.825-0.998) with a strong correlation (r = 0.725-0.998; P < 0.001) for length, thickness and width measurements. In addition, there were not any inter-rater  www.nature.com/scientificreports www.nature.com/scientificreports/ statistically significant differences (P > 0.05) between first and second observer for length and thickness, width measurements.
Analysis of reliability and correlation of the SL dimensions between inter-session US and caliper measurements for both observers (Table 9) showed an excellent inter-rater reliability (ICC (1-1) = 0.911-0.966) with a strong correlation (r = 0.852-0.937; P < 0.001) for length, thickness and width measurements. In addition, there were not inter-session statistically significant differences (P > 0.05) between US and caliper measurements length, thickness and width measurements.
The LoA (95% CI) of the measurements using both devices, US and caliper, showed values for all dimensions which tended to almost perfect agreement, showing no variability. Figures 2-4 showed the Brand-Altman plots for length, thickness and width dimensions, respectively, between US and caliper measurements. For each variable and almost every specimen, the difference between device´s means fell within the 95% CI of all measurements.

Discussion
Several investigations about dimensions of the SL have used magnetic resonance imaging (MRI) to evaluate the anatomy of this structure in cadaveric feet 6,19,20     According to repeatability analyses 10-13 , our measurements showed good repeatability (P-value > 0.05) for the SL dimensions by US (Table 7), caliper (Table 8) and comparison between both tools (Table 9) between inter-session first and second observers values, except for SL width dimension measured with US (P-value = 0.019). Despite SL width dimensions should be considered with caution due to these US repeatability differences, to the authors' knowledge, our study may be considered as the first research work providing reliability, absolute accuracy, correlation and repeatability for SL width dimension measured by US, due to prior US reliability studies mainly focused on SL length and thickness [5][6][7][8] .
In addition, MDC values for the SL dimensions, such as length (MDC US = 0.069 cm versus MDC Caliper = 0.083 cm), thickness (MDC US = 0 cm versus MDC Caliper = 0.021 cm) and width (MDC US = 0.013 cm versus MDC Caliper = 0 cm), showed that US measurements presented a higher absolute accuracy with lower MDC values than caliper measures for SL length and thickness dimensions, while caliper displayed greater absolute accuracy with lower MDC for SL width dimensions. According to MDC may be used as the change magnitude necessary to provide measuring confidence to be sure about these values are not the result secondary random variations or measurement errors 12 , these MDCs may be considered as cut-off reference values to determine SL dimensions modifications secondary to anatomic abnormalities 5-8 , ultrasound-guided invasive procedures 9 , and ligament injuries course after treatment 21,22 .  www.nature.com/scientificreports www.nature.com/scientificreports/ In accordance with our findings suggesting that these two techniques may be accurate for determining SL dimensions in human cadaveric feet, Harish et al. showed that US may be an effective imaging tool to evaluate SL abnormalities in patients with symptomatic posterior tibial tendon conditions compared to MRI as the gold standard tool 23 . In addition, Crim 24 stated that MRI may be considered as the first-line evaluation procedure for the assessment of the SL conditions. Nevertheless, our study findings did not consider US and caliper measurements under SL conditions, while US and MRI have already been compared showing excellent findings 23 . As a future research line, we propose that both US and caliper reliability should be studied under SL pathologies.
The present study supported an ultrasound technical study for SL dimensions evaluations compared with caliper measures as gold standard which may be used as a reference for ultrasound-guided procedures in formaldehyde-embalmed human cadavers 9 . Future studies should consider these procedures in fresh-frozen cadavers as well as in vivo with healthy subjects and SL injured patients 21,22 .
Several limitations should be recognized regarding our approach for anatomical dissection and US procedures. Thus, we could not determinate the whole SL complex morphology and anatomic variations and further investigation is need in this field. First, only 2 observers were compared in the present study and future research studies should consider several observers for a better accuracy. Second, echogenicity changes could have modified the ability to perform the ultrasound measurements in ligament morphology, especially in the width dimensions showing a worse accuracy in the present study, given that the tissues have been infused with formalin for preservation due to this procedure can lead to asymmetric contraction of the tissue secondary to its anisotropic nature 9 .
conclusion Both US and caliper could be recommended for all SL dimensions evaluation due to their excellent reliability and strong correlation in cadavers, although width dimensions should be considered with caution due to US repeatability differences.