Analysis of the concurrent validity and reliability of five common clinical goniometric devices

Measurement errors play an important role in the development of goniometric equipment, devices used to measure range of motion. Reasonable validity and reliability are critical for both the device and examiner before and after to testing in human subjects. The objective is to evaluate the concurrent validity and reliability of five different clinical goniometric devices for the purpose of establishing an acceptable measurement error margin for a novel device. We explored the validity and inter- and intrarater reliability scores of five goniometric devices namely (i) the universal goniometer (UG), a two-armed hand-held goniometer, (ii) the inclinometer (IC), featuring a single base, fluid level, and gravity-weighted inclinometer, (iii) the digital inclinometer (DI), functioning as both a DI and dynamometer, (iv) the smartphone application (SA), employing gyroscope-based technology within a smartphone platform application and (v) the modified inclinometer (MI), a gravity pendulum-based inclinometer equipped with a specialized fixing apparatus. Measurements were obtained at 12 standard angles and 8 human shoulder flexion angles ranging from 0° to 180°. Over two testing sessions, 120 standardized angle measurements and 160 shoulder angle measurements from 20 shoulders were repetitively taken by three examiners for each device. The intraclass correlation coefficient (ICC), standard error of measurement (SEM), and minimal detectable change (MDC) were calculated to assess reliability and validity. Concurrent validity was also evaluated through the execution of the 95% limit of agreement (95% LOA) and Bland–Altman plots, with comparisons made to the UG. The concurrent validity for all device pairs was excellent in both study phases (ICC > 0.99, 95% LOA − 4.11° to 4.04° for standard angles, and − 10.98° to 11.36° for human joint angles). Inter- and intrarater reliability scores for standard angles were excellent across all devices (ICC > 0.98, SEM 0.59°–1.75°, MDC 1°–4°), with DI showing superior reliability. For human joint angles, device reliability ranged from moderate to excellent (ICC 0.697–0.975, SEM 1.93°–4.64°, MDC 5°–11° for inter-rater reliability; ICC 0.660–0.996, SEM 0.77°–4.06°, MDC 2°–9° for intra-rater reliability), with SA demonstrating superior reliability. Wider angle measurement however resulted in reduced device reliability. In conclusion, our study demonstrates that it is essential to assess measurement errors independently for standard and human joint angles. The DI is the preferred reference for standard angle testing, while the SA is recommended for human joint angle testing. Separate evaluations across the complete 0°–180° range offer valuable insights.

been developed to measure mobility, ranging from simple visual examination to complex three-dimensional mobility assessment, in order to support the overarching goals of the Sustainable Development Goals, particularly in the areas of Good Health and Well-being.Over time, the development of medical tools for ROM measurement has proceeded in parallel with general developments in technology; examples include the electrogoniometer 7 ; goniometers with short or long arms 8 ; laser projection with a Halo digital goniometer (laser projection used as a goniometric arm) 9 ; photogrammetry software 10 ; digital goniometers 11 ; the Hawk goniometer, a digitalbased goniometer with a plastic, parallel-piped sensor and internal gyroscope 12 ; inertial sensors for real-time monitoring 13 ; and smartphone applications (SA), which employs an inertial measurement unit (IMU)-based goniometer 14,15 .However, any technology that does not provide valid and reliable measurements is not a suitable basis for clinical decisions.Moreover, portability, cost, convenience, and suitability for everyday rehabilitation practice remain a gap for further development of goniometric devices.
To be clinically useful, ROM measurement tools must be confirmed for validity and reliability.Studies on the reliability of equipment for measuring ROM 16 have shown the influence of instrumentation, procedures, discrepancy of movement direction, distinction of body parts and different patient types.Peters et al. 17 found inconsistency in the reliability of goniometric devices for assessing ROM from one clinician to another; validity and reliability can be impacted by irregularities during measurement-for example, bony landmark positioning, accuracy, consistency of the examiner in establishing the zero point and positioning of the instrument against the target body segment 16,18 , may all contribute to increasing the risk of error.Human soft tissue and the inability to "see" joint centers and bone are aspects that must be considered in addition to examining the measurement properties of goniometric devices.These significantly impact examiner factors.Obviously, ROM measurement errors stem from three sources: the device, the examiner, and the patient 19,20 .Ideally, validity and reliability should be transparently investigated.Errors emerging from the equipment should be minimized during the development process.
In the process of developing a ROM measurement device, it is essential to address two of the three primary sources of variability 19 .First one must address variability inherent in the capacity of the device to quantify angular differences.Second one must accommodate variability arising from the examiner's skill in using a device to measure angles.Thereafter, human-specific factors will also contribute to measurement variability.A previous study addressed this point by using standard angles to account for the human variability factor, namely Carvalho et al. 10 .They examined the reliability and reproducibility of goniometric measurements, compared with hand photogrammetry, by using standard angles with a wax hand mold.Volunteer examiners were instructed to position the fulcrum of the goniometer, corresponding to the axes of each joint, according to their clinical experience; then, photographic records were taken for analysis.Wellmon et al. 19 examined the concurrent validity and interrater reliability of two goniometer mobile applications, the inclinometer (IC), and universal goniometer (UG)-by applying standardized angles from wooden models.This effectively fixed patient factors that can affect repeated measurements, enabling examination of concurrent validity and interrater reliability relating to examiner skill and the accuracy of smartphone devices and applications for determining angular excursion.Unfortunately, the acceptable degree of reliability exhibited by the equipment and examiner without patient factors has not been clearly described in the literature.This gap subsequently influences the success of the invention of a new goniometric device by limiting the inventor's ability to proceed to the next step of conducting a study on human joints.
An in-depth analysis of measurement error, originating from the precision of equipment and the expertise of examiners, in the context of both standard joint assessments and human joint measurements, is imperative.Drawing upon the knowledge gained from widely adopted clinical instruments or gold standard can provide valuable guidance for the development of new measurement tools 5 .Although radiographic measurement has been acknowledged as the gold standard 21 , it results in unnecessary radiation exposure and cannot necessarily be used reliably to measure changes in ROM 9 .Meanwhile, UG and IC have been most extensively implemented in clinical settings since the past because of their portability, low cost, convenience, and reasonable validity and reliability [22][23][24] .Numerous studies have indicated that the intra-and interrater reliabilities of the UG in the assessment of human joint ROM were excellent, with intraclass correlation coefficient (ICC) values consistently exceeding 0.90 8,23,24 .The certainty of the application of UG to clinical practice was reinforced by the fact that the validity of UG compared with that of radiographic measurements was high, as indicated by an ICC value of > 0.90 23,24 .Although the reliability and validity of IC for measuring human joint ROM varied from poor to excellent, it has been widely used to measure spinal ROM 25,26 .The digital inclinometer (DI) is portable, accurate, and reliable; therefore, theoretically, it can be applied in practice 18,27,28 ; however, it comes at a higher cost than both the UG and IC 29 .Recently, our approach to patient management in rehabilitation practice has evolved due to the impact of novel technologies and the use of computer-based applications (apps).A recent systematic review of the validity and reliability of SAs for ROM measurement has sufficiently supported their viability as goniometer substitutes 14 .Because the IC can be modified (modified inclinometer [MI]) by attaching a fixing apparatus to free the examiner's hand, reading the scale, stabilizing the extremity, and guiding movement can be accomplished by one examiner 30 .MI has been used in a particular rehabilitation approach; however, its validity and reliability have been strongly confirmed.In conclusion, the UG, IC, SA, DI, and MI have gained popularity in clinical settings because of their ease of access and compatibility in terms of size and weight, making them convenient choices for diverse applications across different settings.However, note that each device comes with its unique set of advantages and disadvantages, which has led to their selection in different settings (Table 1).Therefore, an analysis of the validity and reliability of these different angular measurement devices constitutes a priority research gap that should be addressed to determine the inherent technical error, which should be taken as a reference while developing any given new device.
This study aimed to explore the concurrent validity and intra-and interrater reliabilities of five goniometric devices (i.e., UG, IC, SA, DI, and MI) by focusing on examiner factors and the measurement error of the devices.This study was conducted to provide valuable insights into setting thresholds for measurement error to help

Raters and samples
All three examiners were physical therapists with > 10 years of experience.To standardize the angles measured in the test/re-retest, a testing apparatus was developed to simulate the movement of the shoulder joint, which has the largest arc of movement among human joints (Fig. 1).The apparatus consisted of two arms joined together at one end for the axis of movement.The first arm was slightly curved, mimicking the humerus.The second arm was a straight, stationary arm fixed at one end to the wooden base.The axial end of the straight arm held a circular fitting with 16 holes used to fix the two arms in relation to a specific measurement angle.Twelve angles were set, ranging from 0° to 180°.Each angle was measured for 10 trials; thus, there were 120 measurements in total for each examiner with each device.
During the human joint angle measurement phase, measurements were taken from a group of 20 healthy shoulders, consisting of 10 individuals (5 males and 5 females) with an average age of 23.10 ± 3.25 years, an average weight of 68.70 ± 21.33 kg, an average height of 166.60 ± 6.88 cm and an average body mass index of 24.74 ± 7.49 kg/m 2 .Each shoulder was assessed at 8 different angles, ranging from 0° to 180°.This resulted in 160 measurements for each examiner using each device.www.nature.com/scientificreports/To assess reliability, measurements of each of the three standardized angles and each of the two shoulder flexion angles were analyzed, ensuring that at least 30 heterogeneous samples were examined 32 .Groups of three sequences of standardized angles and groups of two sequences of human shoulder flexion angle lying in the same quarter of the semicircle were analyzed, as follows: 1st quarter, 0°-45°; 2nd quarter, > 45°-90°; 3rd quarter, > 90°-135°; 4th quarter, > 135°-180°.

Procedures
The same evaluation conditions were maintained for each examiner at each testing session, encompassing both study phases.Before data collection for each phase, all examiners participated in a practice session to clarify the study procedure and measurement methods for all devices.Three examiners (Researchers B.S, N.L., and W.S.) measured the standardized angles and human shoulder joint flexion angles using each device (i.e., UG, IC, DI, SA, and MI) in a random order.Each standardized angle and each shoulder flexion angle of every participant underwent multiple measurements by each examiner using every designated device.The measurements were performed in two testing sessions, with a 2-week gap between sessions for standardized angle measurements and a 2-day gap for shoulder flexion angle measurements.The assignment and order of the 12 standardized angles and 8 shoulder flexion angles for each participant were randomly determined by Researcher S.W.To blind the examiners to the data recorded, readings were taken by a second investigator (Researcher S.K.) and recorded by an assistant researcher.Whole numbers at 1° increments were recorded.
The process of establishing shoulder flexion angles for all participants was meticulously performed while they were in the supine position (lying face upwards).To maintain precision and consistency in the starting position for each testing instance, markers were strategically placed to delineate the positions of the entire trunk and the testing arm on the bed.This careful approach was instrumental in achieving a uniform starting point for all measurements.Shoulder flexion angles were systematically determined using a polyvinyl chloride (PVC) pipe that featured distinctive markings on both the PVC pipe itself and the bed.This standardization process was diligently supervised by the same assistant researcher for all participants, ensuring that the angle settings were accurate and consistent across the board.During the human testing phase, the specific shoulder flexion angle was meticulously set by the designated assistant researcher.Subsequently, the examiner responsible for the final adjustments and alignment played a critical role in ensuring that the measurement device was precisely positioned before making measurements.

Goniometric measurements
Figures 2 and 3 shows the measurement procedures used for all goniometric devices.To blind the examiner to the readings, the scale, screen or monitor of each device was directed away or covered.
The UG used in this study was a 12-inch transparent plastic model, specifically the Baseline® Model 12-1000 (Fabrication Enterprises, White Plains, NY, USA), featuring a protractor scale, two arms, and a fulcrum.The IC used was a 180° Baseline Bubble® (Fabrication Enterprises), which operates based on fluid levels.These two devices present a 360° scale with 1° increments.To measure angles using the UG, the examiners positioned the fulcrum of the UG on the axis of the apparatus (Fig. 2a and b) or acromion process of the participant's shoulder and aligned the UG's stationary and movable arms to the arms of the apparatus or the participant's humerus and trunk (Fig. 3a and b).For the IC, the examiners positioned the base of the IC against the two arms of the apparatus (Fig. 2c and d) or the participant's humerus in two consecutive positions (Fig. 3c and d).
The gyroscope-based goniometer was a Samsung Galaxy Note Fan edition smartphone running the Goniometer Records application (Indian Orthopedic Research Group, www.iorg.co.in/ 2013/ 05/ gonio meter-recor ds-mobile-app/).This application was chosen because it is free on Google Play and quite accurate 19,33 .During the measurement process, the alignment of the smartphone's edge with either the arms of the apparatus (Fig. 2e  and f) or the participant's humerus was performed in two consecutive steps (Fig. 3e and f).
In this study, the MicroFET® 3 DI (Hoggan Scientific in Salt Lake City, UT, USA) was used.This device was chosen for its versatility, as it can serve as both a handheld dynamometer and a DI.It is known for its costeffectiveness and ease of implementation in a clinical setting 34 .For measurements, the examiner placed the device parallel to the stationary arm of the apparatus or the participant's humerus at the starting position.The reading angle was recorded when the examiner aligned the device with the movable arm of the apparatus (Fig. 2g  and h) or the participant's humerus at the final position and pressed the "Final Setting" button on the side of the device (Fig. 3g and h).
In this study, the MI employed was a gravity pendulum-based IC originally designed as a low-cost goniometer.Modifications were made to this device, including the addition of an adjustable scale and a gravity pendulum reading scale.Furthermore, a fixing apparatus was used with the inclinometer.This design was proposed in order to free the examiner's hands for controlling unwanted movements during ROM measurements.During measurement, the device was attached to the patient, allowing the examiner to use their hands to support the patient's movement.To measure the sample angles in this study, the examiner fixed the device to the movable arm of the apparatus or the participant's arm and set the zero scale when the movable arm remained in its starting position.To ensure that the examiner was blinded, the readings were observed and recorded by a second investigator as the movable arm of the apparatus (Fig. 2i and j) or the participant's arm moved into the final position (Fig. 3i and j).

Statistical analysis
Descriptive statistics of the 12 standardized angles and 8 human joint angles measured by all examiners using all devices in both testing sessions were calculated.The ICC values of the two-way mixed model were calculated to describe concurrent validity and inter-and intrarater reliabilities.These analyses were performed separately for the two study phases: standard angle measurement and human joint angle measurement.Inter-and intrarater reliabilities were considered in terms of the ICC as follows: poor, < 0.5; moderate, 0.5-0.75;good, 0.75-0.9,and excellent, > 0.9 35 .As an additional examination of concurrent validity and reliability, the standard error of measurement (SEM) was calculated in relation to the ICC using the following formula: SEM = standard deviation (SD) × √(1 − r) 35,36 .The SEM is often employed for clinical measurement procedures to avoid intersample variability 37 .A lower SEM implies greater measurement accuracy.To determine the true changes in ROM (vs.random error), the minimal detectable change (MDC) at the 90% confidence level was calculated using the following formula: MDC = 1.65 × SEM × √2 37 .To reflect the smallest unit of measurement of all goniometric devices, the MDC values were rounded to the nearest degree.The concurrent validity between two measurement devices was described as reasonable validity when the ICC was > 0.90 35 .Furthermore, agreement and systematic differences between measurement devices were examined using Bland-Altman plots.Differences relative to the range of true measurements were assessed using 95% limits of agreement (95% LOA), calculated as follows: mean difference between devices ± 1.96 × SD 35,38 .www.nature.com/scientificreports/

Ethics approval and consent to participate
All participants signed a consent form before testing.The study was conducted according to the Declaration of Helsinki and was approved by the Ethics Committee of Burapha University under protocol number HS014/2566(C1) and IRB number IRB1-070/2566.
In the analysis of concurrent validity, the ICC, SEM, MDC, 95% LOA, and mean of differences between device pairs are shown in Table 4.All device pairs demonstrated ICC values exceeding 0.99 for both standard angle and human joint angle measurements.For measuring standard angles, the three device pairs that included UG, IC, and DI showed an SEM within 1°, MDC within 2°, and 95% LOA between − 2.69° and 3.00°.Device pairs that included SA, MI, and other devices demonstrated a trend toward a greater SEM, MDC, and 95% LOA (0.92°-1.32°, 2°-3°, and − 4.11°-4.04°,respectively).When measuring human joint angles, device pairs that included UG, IC, SA, and DI showed an SEM within 3°, MDC within 8°, and 95% LOA between − 10.98° and 8.41°.In contrast, device pairs that included MI and other devices tended to have higher SEM, MDC, and 95% LOA values (4°, 7°-9°, and − 10.38°-11.38°,respectively).The Bland-Altman plots of each pair, demonstrating their scatter, are shown in Figs. 4 and 5.
Interrater analysis for all measurement devices suggested excellent reliability for each standardized angle and the overall ROM (ICC between 0.980 and 0.999).DI showed the lowest SEM (0.61°-1.05°) and MDC (1°-2°) for each standardized angle.UG and IC had an SEM within 1.48° and MDC within 3°.SA and MI showed a trend toward lower reliability, with a greater SEM (0.59°-1.75°) and MDC (1°-4°) than the other devices.All devices, except for DI, tended to have lower interrater reliability (with a greater SEM and MDC) under wider ROM conditions (Table 5).For human joint angle measurement, all measurement devices exhibited varying levels of interrater reliability across all joint angles, ranging from moderate to excellent (ICC between 0.697 and 0.975; SEM between 1.93° and 4.64°; and MDC between 5° and 11°).All devices exhibited lower interrater reliability when measuring wider ROMs, particularly in the fourth quarter of joint angles, showing moderate reliability (ICC between 0.680 and 0.744; SEM between 3.46° and 4.64°; and MDC between 7° and 11°) (Table 5).www.nature.com/scientificreports/Analysis of intrarater reliability (Table 6), with each examiner and overall, demonstrated that all devices had excellent reliability (ICC = 0.977 to > 0.999) for each standardized angle and overall ROM.The DI had the lowest SEM and MDC for each standardized angle (0.56°-0.90° and 1°-2°, respectively), whereas MI had the highest SEM and MDC (1.04°-1.91°and 2°-4°, respectively).The UG, IC, and SA had SEM values within 1.43° and MDC values within 3°.All devices, except for DI, tended to have lower intrarater reliability (with greater

Discussion
The present study is the first to explore measurement errors, considering both device and examiner factors, with and without human factors.We conducted a thorough examination of measurement error using five goniometric devices, covering a range of available ROM from 0° to 180° across 12 standard measurement angles and 8 human shoulder joint flexion angles.Our findings can serve as reference values for the development of goniometric equipment, both before and after conducting studies on human joints, while also considering errors from the equipment and examiner objectives.As a primary objective, we conducted a detailed assessment of concurrent validity to investigate the impact of technology-based device designs on examiner performance.We compared four common measurement devices with UG, a standard clinical tool, across two phases: standard angle measurements and human joint angle measurements.Our analysis in both phases yielded ICCs values exceeding 0.99 for all device pairs, demonstrating their reasonable validity 35 .Additionally, the Bland-Altman plots for each device pair displayed even dispersion along the x-axis, with mean differences ranging from − 0.97 to 1.08 for the standard angle measurement phase and − 1.59-1.39for the human joint angle measurement phase.These results suggest that the differences between the two instruments are consistent and not significantly different 39 .In the standard angle measurement phase, our findings indicated consistency among each device pair.This aligns with the findings of prior research, confirming the potential of technology-based devices to replace UG without introducing significant variability 19,[40][41][42] .However, when measuring human joint angles, notable discrepancies among the devices were observed, highlighting the substantial influence of technology-based device designs on examiner performance in complex scenarios.Of particular significance, both SA and DI stood out because of their utilization of higher-precision embedded  www.nature.com/scientificreports/technology, which eliminates the need for examiners to read scales or maintain a final position for scale reading.This unique feature sets them apart from traditional measurement tools, such as IC and MI, significantly contributing to their superior performance in measuring human shoulder joint angles 18,41,43 .In conclusion, our study highlights the potential of technology-based devices, particularly SA and DI, in replacing UG and improving measurement accuracy, particularly in complex scenarios, such as measuring human shoulder joint angles.These findings underscore the critical role of device technology and design in examiner performance.
To enhance precision and reliability in clinical measurements, considering these factors when selecting tools is crucial.Additionally, our insights suggest that the development of new goniometric devices with features eliminating the need for reading scales or allowing for fixed final scores for later reading could substantially reduce measurement errors in various research and clinical settings.
For the secondary objective, our reliability analysis consistently revealed excellent inter-and intrarater reliabilities (ICC > 0.90) for all standardized angles and the first three quadrants of the human shoulder flexion angle.However, in the last quadrant, reliability ranged from moderate to good levels, for both intra-and interrater assessments 35 .We consistently observed an increasing trend in both intra-and interrater reliabilities as the measurement angles widened.This trend remained consistent across both phases for all devices, except for DI, when measuring standard angles.Notably, this trend became more pronounced when measuring human joint angles.This aligns with the common understanding that measuring human joint angles involves a complex interplay of factors, including device, examiner, and individual-specific factors, resulting in greater measurement variability.These findings parallel the outcomes of our concurrent validity study, which revealed larger and more dispersed mean differences among device pairs at wider angles, particularly in the fourth quadrant of  19 explored interrater reliability for standardized acute, right, and obtuse angles.They reported differences in means for measurements performed using SA, suggesting the potential for clinically meaningful differences to arise when measuring angles > 90°, although they could not provide further clarification.Our results support the findings of Wellmon et al., as four of the goniometric devices exhibited the same trend, except for DI.This trend can be attributed to the alignment of the goniometric device's reference part.Notably, we observed that the reference part tended to shift more when the final position significantly deviated from the starting position.This shift primarily occurred due to substantial alterations in soft tissue tension during closely end-range motion, causing changes in arm shape and consequently affecting reference part alignment.It is imperative to highlight that our study uniquely addressed reliability at various joint angles, encompassing both standardized and human joint angles.However, this trend was not observed when using DI to measure all standard angles.This deviation may be because of the scale-free reading function and the wider width of the DI reference base, which makes it easier to align by placing it on the surface of the apparatus arms at all angles.The clinical implications of this finding suggest that when measuring joint angles across a wide range, it is critical to reconfirm reference part alignment for consistency with the starting position, particularly during significant posture changes in end-range motion.These insights hold promise for enhancing the accuracy and reliability of joint angle measurements in clinical and research applications, aligning with the goal of accurately reflecting clinical changes, such as treatment effectiveness or the progression of a condition.
No previous study has reported reference values for instrument-focused measurement error corresponding to common goniometric devices (ICC of inter-and intrarater reliabilities, concurrent validity, SEM, MDC, and 95% LOA).Such reference values are necessary for non-experimental studies on the development of novel prototype goniometric devices.Our report concurred substantially with the findings of previous studies.Chapleau et al. 23 examined the reliability and validity of UG compared with those of radiography for ROM measurement of healthy elbows.Regarding concurrent validity, a 95% LOA of ± 10.3 (or less) was reported.The ICC for the interrater reliability of UG ranged from 0.95 to 0.97.Wellmon et al. 19 studied concurrent validity and reliability by focusing on device and examiner factors and excluded patient factors.They reported an ICC of 0.999 for the concurrent validity of UG and IC, with a 95%LOA ranging from − 3.8 to 3.5.The interrater reliability of UG and IC was also excellent (ICC > 0.99).Hancock et al. 8 examined the accuracy and reliability of five knee goniometric methods by supporting the limb to maintain knee angles during measurement.They reported excellent intrarater (ICC > 0.98) and interrater (ICC > 0.99) reliabilities, with the minimum significant differences ranging from 6° to 14°, for both short-and long-arm and laser projection-based digital goniometers.Kolber and Hanney 30 reported the interrater reliability of IC for identifying posterior shoulder tightness.Excellent reliability (ICC = 0.90) with an MDC of 9° and SEM within 4° was reported.UG, IC, and DI are commonly used in clinical practice 16 and have been recommended as the gold standard by numerous studies 8,18,19,29,31,44 .Therefore, measurement error metrics based on these three devices can be recommended as reference values.In the light of our findings, it can be concluded that ICC values for inter-and intrarater reliabilities should be > 0.90, SEM should not exceed 2°, and MDC should not be greater than 3°.In terms of concurrent validity, UG and IC set the reference device; ICC values should be > 0.90, SEM should not be greater than 1°, and 95% LOA should range from − 3° to 3°; these criteria can set the error limits for measuring standardized joint angles in non-experimental studies of goniometer prototypes.
In the development of new goniometric devices, extending accuracy testing to include human joint measurements after assessing known angles is essential.This is important because of the variability among individuals, which can have a substantial impact on both the device's performance and the examiner's accuracy.Our findings showed that wider joint angles led to increased measurement errors, especially in human joint measurements.This is because of the complex interplay of factors, including tissue tension, changes in limb shape, and misalignment from the starting position.Considering these factors and the specific characteristics of each device, we must analyze the sources of measurement error, discuss control methods and furthermore make recommendations for developing more accurate clinical goniometric devices.Incorporating considerations of intra-and interrater reliability and concurrent validity in human joint measurements from our findings is crucial.
UG demands a high level of examiner skill and involves scale reading, although it does not require holding the final position for immediate scale reading (the final score can be fixed and read later).It necessitates aligning three anatomy points: the axis and both the stationary and movable arms, which places a premium on detailed anatomical identification.However, this feature is advantageous when realigning the zero-starting position upon reaching the final ending position.Although this characteristic presents minimal challenges when measuring standard angles because of their clear and easily definable axes and arms, it poses difficulties in measuring human joints, particularly in large joint angle quadrants where defining the axis and reference body parts becomes more intricate.Although, the fluid level inclinometer in this study requires scale reading and stabilization of the final position for scale reading.Nevertheless, it has a short reference base, which is contrary to previous studies that indentified the positive effect of extending the goniometer arm on measurement accuracy 8,45 .
The DI demonstrated superior validity and intra-and interrater reliabilities when used to measure standard angles.However, it did not exhibit the same level of superiority when measuring human joints.The DI wide reference base width facilitated its deployment on standard angle arms but did not yield a similar positive effect when measuring human joints.On the other hand, the DI short reference base and large size made alignment more challenging, particularly when measuring human joints near the end range.Contrary to previous research findings 27,28 , our study showed that the DI's reliability for ROM assessment was lower compared to the UG.However, our findings provided greater validity and reliability than those of Kolber et al. 18 , who examined the reliability and concurrent validity of shoulder mobility measurements using a DI compared with those of shoulder mobility measurements using a UG.They reported an SEM of 2° and a LOA of ± 11°, which are reasonable values for patient measurements.Our MDC values also achieved improved accuracy relative to those reported by Mohammad et al. 29 , who noted MDC values ranging from 1.45° to 11.89° when assessing ROM in lower extremity joints.A direct comparison is however challenging due to differences in angle sources, study populations, and the use of a different specific model of the DI.
When measuring standardized joint angles, MI exhibited higher measurement errors than the other devices.However, their ICC values for concurrent validity exceeded 0.90, indicating excellent interrater and intrarater reliabilities, which are generally considered acceptable 35 .MI exhibited slightly inferior concurrent validity and intra-and interrater reliabilities when measuring both standardized and human joint angles.This could be attributed to the need for scale reading and holding the final position for scale reading.In contrast, MI only required an initial reference setting (zero starting) and then reading the scale at the final position, which limits adjustments to the final alignment.Additionally, this difference in performance might be related to partially unstable fixation between MI and the measurement apparatus.Body shape changes occur beneath the fixing apparatus because of the tension of the surrounding soft tissue.This differs from that shown in a previous study that measured neck movement and applied a fixing apparatus (tape) around the head, where there was less significant shape change during measurement 42 .Clinically, applying the fixing apparatus to areas with minimal shape changes, such as bony prominences, is advisable to ensure more stable measurements.
For standardized angle measurements, SA showed slightly decreased reliability, which is consistent with the findings of prior research highlighting design-related variability, particularly due to rounded edges.This finding agrees with that reported by Wellmon et al. 19 , who investigated the concurrent validity and interrater reliability of the Goniometer Record and Goniometer Pro applications installed on various smartphones for measuring standardized angles.They considered UG and IC as the reference standards.Their study revealed ICC values for concurrent validity (using both applications) exceeding 0.99 and 95% LOA within ± 4.05°, indicating strong agreement.Interrater reliability was excellent, with an ICC exceeding 0.99.They emphasized the influence of smartphone design on reliability, particularly when placing the smartphone's edge against a flat testing apparatus surface.When measuring human joint angles, SA exhibited excellent concurrent validity, with an ICC exceeding 0.90, SEM within 3°, MDC within 7°, and 95% LOA ranging from ± 10°.Furthermore, it demonstrated impressive intrarater reliability, with an ICC exceeding 0.90, SEM within 4°, and MDC within 7°, and strong interrater reliability, featuring an ICC exceeding 0.90, SEM within 5°, and MDC within 9°.These findings in the present human study highlight the superior validity and reliability of SA compared with those of other devices.This can be attributed to the high-precision technology embedded 46 and the technique employed, which aligns smartphone reference lines with humerus positioning, effectively mitigating variations caused by nonflat surfaces.Several factors likely contributed to these excellent results, including the absence of scale reading, the capability to establish references twice (initially at the zero starting position and later at the final position, with the option to adjust alignment in both instances), and the extended length of the smartphone's edge (long side), which enhanced alignment with the humerus 2 .
In a study by Ockendon and Gilbert 47 , the validity of a novel smartphone accelerometer-based goniometer was assessed, examining 5°-45° of knee flexion deformity compared with a standard Lafayette goniometer.They reported that 95% LOA was ± 7.6°, indicating good agreement.However, earlier studies 48,49 have reported varying levels of validity and reliability when using Android and iPhone applications to measure cervical ROM among healthy participants, ranging from poor to excellent.Chapeau et al. 23 conducted a noteworthy study on radiographic elbow measurements, reporting interrater ICC values ranging from 0.98 to 0.99.They recommended that a clinically acceptable maximal measurement error should not exceed 10°.In conclusion, both a gyroscopebased smartphone application (using the Goniometer Records application) and a modified gravity pendulum inclinometer (IC) with a fixing apparatus proved suitable for measuring the feasible range of motion in clinical practice.However, when developing new clinical goniometric devices aimed at challenging validity and reliability, SA should be considered a reference device with its unique set of challenges under human joint testing phase.
The limitation of our study is that it focused solely on measuring shoulder flexion in one direction.Future studies should consider measuring motion angles in other directions and examining joints with pathological conditions.Additionally, following this, elegant finite element studies may be conducted using the data extracted to assist in developing clinically more accurate numerical simulations for bioengineering.

Conclusions
Our study provides insights into the capabilities of three examiners to accurately use five commonly used clinical goniometers (i.e., UG, IC, SA, DI, and MI), focusing on device and examiner factors, considering their impact with and without human-specific factors in order to derive reference values for error quantification and clarify what objective applies when developing a new device for measuring ROM.Testing should start with an examination of known standard angles.We recommend that the ICC of reliability should be greater than 0.90, the SEM should be less than 2°, and the MDC should not be greater than 3°.The most accurate and reliable goniometric measurement devices, in terms of all error metrics, were DI for standardized angle measurements and SA for human joint angle measurements.When developing a new clinical goniometric device and challenging its validity and reliability, DI and SA should be considered as reference devices for testing standardized angles and human joint angles, respectively.For standardized joint angles, concurrent validity should meet the criteria of ICC greater than 0.90, SEM less than 1°, MDC within 2°, and 95% LOA within ± 3°.For human joint angles, concurrent validity should adhere to the criteria of ICC greater than 0.90, SEM less than 3°, MDC within 7°, and 95% LOA within ± 10°.Factors, such as the absence of scale reading, the inclusion of a fixing final scale function and ensuring a sufficiently long reference part may play crucial roles.Moreover, we found dissimilar inter-and

Figure 1 .
Figure 1.Example of standard measurement angles and human shoulder flexion angles.Starting and final measurement positions for: (a and b) standard measurement angle; (c and d) human shoulder flexion angle.

Figure 2 .
Figure 2. Goniometric devices and procedures for measuring standard angle.Starting and final measurement positions for: (a and b) universal goniometer; (c and d) inclinometer; (e and f) smartphone application; (g and h) digital inclinometer; (i and j) modified inclinometer.

Figure 3 .
Figure 3. Goniometric devices and procedures for measuring human shoulder flexion angle.Starting and final measurement positions for: (a and b) universal goniometer; (c and d) inclinometer; (e and f) smartphone application; (g and h) digital inclinometer; (i and j) modified inclinometer.

Table 1 .
Comparative analysis of the characteristics of five common clinical goniometric devices.

Table 2 .
Mean and standard deviation (SD) of the angle measured by five goniometric devices.

Table 3 .
Mean and standard deviation (SD) of human shoulder range of motion (ROM) measured using five goniometric devices.

Table 4 .
Statistical summary of agreement of all goniometric measurement devices.ROM Range of Motion, ICC Intraclass correlation coefficient; MD mean of differences, SDMD SD of MD, SEM standard error of measurement, MDC minimal detectable change and 95% LOA 95% limits of agreement for all goniometric measurement devices.

Table 6 .
8ntrarater reliability metrics.ROM Range of Motion, ICC Interclass correlation coefficient, 95%CI 95% confidence interval, SEM standard error of measurement, and MDC minimal detectable change for all goniometric measurement devices, UG Universal Goniometer, IC Inclinometer, SA Smartphone Application, DI Digital Inclinometer, MI Modified Inclinometer, Q1 0°-45°, Q2 > 45°-90°, Q3 > 90°-135°, Q4 > 135°-180°.humanjointangles.Studies have typically focused on measuring the entire ROM for each joint direction.Note that although the study by Handcook (2018)8, a frequently cited literature source, measured three angles of the knee joint, it did not report reliability values for each angle separately.This divergence poses a challenge when comparing our findings with those of previous studies.Furthermore,Wellmon et al. (2016)