Introduction

A pressure injury (PI), is defined as a “localized injury to the skin and/or underlying tissue, usually over a bony prominence, resulting from pressure or a combination of pressure and shear” [1]. PIs are one of the most frequently encountered complications in people with spinal cord injury (SCI) [2]. A quarter of individuals with SCI will develop a PI prior to inpatient rehabilitation [3,4,5,6,7], and prevalence at rehabilitation admission has ranged from 10 to 32% [8, 9]. During inpatient SCI rehabilitation, reported incidence has ranged between 10 and 48% [4, 8,9,10,11,12], and a recent study from the United States revealed that 41% of a cohort of 169 persons with SCI had at least one PI within the first year following their discharge from rehabilitation [13].

PIs have well-documented consequences, including an important monetary cost, functional loss, and subsequent health problems such as depression [2]. They have been proven to be associated with diminished quality of life among persons in the chronic phase after SCI [14]. Because PI can be avoided, it is essential to detect individuals at risk of developing this condition and promptly implement measures to prevent them from occurring. A central component of prevention is risk assessment. A structured approach that includes the use of a risk assessment scale in combination with a skin assessment, a mobility/activity assessment as well as clinical judgment is recommended [15]. The Canadian Best Practice Guidelines for the Prevention and the Management of Pressure Ulcers in People with SCI state that the “assessment of pressure ulcer risk is more effectively performed using an objective risk assessment tool than clinical judgment”. The Guidelines also state “numerous tools exist, not all of which have been validated for use on people with spinal cord injury”.

A review by Mortensen et al. [16] of seven PI risk assessment scales found that their validity following SCI ranged from poor to adequate. Reliability was not reported. Of these seven scales, only two were specifically designed for the SCI population. They are the Spinal Cord Injury Pressure Ulcer Scale (SCIPUS) and its version for the acute care setting, the SCIPUS-A [16]. The authors of the review concluded that although the validity of these two scales was adequate, their use could not be recommended due to important methodological limitations in the validation study and the lack of information regarding reliability [16].

Since then, few studies have been conducted to test the metric properties of the SCIPUS. Subsequently, the authors participated in a study testing the performance of the SCIPUS in persons with SCI in a rehabilitation setting [17]. The results indicated that inter-rater reliability was excellent for composite SCIPUS scores and very good for risk stratification [17]. The sensitivity of the tool was found to be excellent, but on the other hand, specificity was poor. Recently, Krishnan et al. [18] found that the SCIPUS adequately predicted the risk of pressure injury only during a specific timeframe, which was within 2–3 days of admission to an acute care setting. The SCIPUS was unable to predict PI in a rehabilitation setting.

The aim of this study is to (1) assess additional metric properties of the SCIPUS in its current format, with a focus on validity using the partial credit Rasch measurement [19] and (2) propose modifications to improve performance.

Methods

The study employed a cross-sectional design and it was conducted in two Canadian rehabilitation centers, the Toronto Rehabilitation Institute-University Health Network (TRI-UHN) (Toronto, ON) and the Institut de réadaptation Gingras-Lindsay-de-Montréal (IRGLM) (Montreal, QC) of the Center intégré universitaire de santé et de services sociaux du Center-Sud-de-l’Île-de-Montréal (CCSMTL). Data were collected as part of the Spinal Cord Injury Knowledge Mobilization Network (SCI KMN) initiative [20]. The SCI KMN is a community of practice comprised of seven rehabilitation centers across Canada. The SCI KMN has supported the implementation of best practices in SCI rehabilitation using implementation science principles, including PI prevention. As part of this initiative, the SCIPUS was selected and implemented at both sites (Toronto and Montreal).

The SCIPUS is a 15-item clinician-administered scale specifically developed and designed to assess the risk of developing a PI in persons with SCI. The majority of items are scored dichotomously as either present or absent (0/1 or 0/2). Four items have three response options that have weighted scores (1/1/4 or 0/1/3) (see Appendix 1). A summary score is calculated by adding the scores of individual items. Scoring ranges from 0 (best prognosis) to 25 (worst prognosis) with a high-risk cut off score at ≥6 [16].

The SCIPUS was completed within 72 h of inpatient admission. All newly admitted persons with a spinal cord injury aged 18 years or older were included in the project (traumatic and non-traumatic), regardless of time since injury. Additional socio-demographic characteristics that were collected included nature of the lesion, gender and presence of PI at admission.

Statistical analyses

Rasch analyses were performed of the SCIPUS to assess aspects of validity using RUMM 2030 software (RUMM Laboratory, Perth, Australia). The Rasch model describes the relationship between an item and a person’s response or performance on this item. The Rasch model is a way to convert ordinal measures into interval-like, meaningful measures of a single latent trait [21]. In this study, the latent trait is defined as the risk of developing a PI.

When using Rasch analysis, the person is positioned along the continuum of latent trait. The items are also positioned along this continuum. As a result, Rasch analysis creates an interval-like measure that allows for the measurement of true differences between and within individuals of the risk of PI development. Once the items are calibrated on the continuum of the latent trait and items meet the expectations of the Rasch model, they are said to “fit” the model [21]. Once items fit the Rasch model, we have evidence that the items composing the SCIPUS form a real measure of the risk of developing a PI and that the addition of scores for each individual yields a true quantity of risk.

Assumptions of the Rasch model must be verified to ascertain fit to the model. First, unidimensionality implies that all items comprising a scale measure one unique dimension (trait). Second, local dependency [21] must be verified, that is, the items should not be related to each other once the effect of the latent trait has been removed. An additional assumption that requires verification is threshold ordering; all response categories should demonstrate the highest probability of being endorsed at different levels of difficulty. Thresholds are those points along a theoretical continuum of item difficulty where the probability of a person scoring either 0-or-1, and 1-or-2, respectively, is equally likely.

The presence of differential item functioning (DIF) should also be assessed. Items with DIF demonstrate different probabilities depending on the group of persons being assessed (e.g., men vs. women) and violate the property of invariance inherent to the Rasch model. This means that for the same level of risk, scores on the items should not differ based upon differing groups such as gender or age. These items must be deleted or split (i.e., a different score is given depending on the person factor).

In Rasch analysis, fit statistics are fundamental and allow for the verification of fit of the scale and its composing items to the model and the ascertainment of the assumptions. First, the global model fit of the scale is verified by a non-significant item–trait interaction. Fit of the individual items to the model is assessed using different criteria: the fit statistics, the item characteristic curves (ICCs) (DIF analysis) and a principal component analysis of the Rasch model item residuals, which is a further test of unidimensionality and local independence. Criteria used for the fit of the items are: standardized fit residuals between +2.5 and −2.5. For the items a non-significant chi-square and F-statistic are also required.

Only when the data fit the model do the characteristics of the model hold true [22]. If parts of the data do not fit the model, a decision to modify (e.g., rescoring items by collapsing response categories) or reject part of the data (e.g., deleting misfitting items) needs to be undertaken. These methods and their criteria are fully described elsewhere [19, 21, 23, 24].

Results

Data from 886 participants were analyzed, approximately 60% of whom were males. Table 1 presents the clinical and socio-demographic characteristics of the study sample.

Table 1 Participant characteristics

Overall fit of the SCIPUS

One-unit interval coding was utilized to analyze the data as it was not possible to “weight” the items as is prescribed by the scale (for example item 4—level of activity original coding is 0/1/4 and was entered in RUMM as 0/1/2. The fit of the data when 15 items of the SCIPUS were analyzed produced a significant item-trait interaction (chi-square = 226.82; p < 0.05; df = 112). This is an indication that not all of the items fit the Rasch model and construct validity could thus not be ascertained. Item 13 (individual in a Nursing Home or Hospital) was automatically excluded from the analysis since all study participants obtained the same score; as they were admitted to a rehabilitation center.

Thresholds

None of the items displayed disordered thresholds. Therefore, no re-coding or collapsing of response options were necessary.

Individual item fit

Item 7—Tobacco use/smoking did not fit the model with a standardized residual of +5.43 and significant chi-squares and F-statistics. All remaining items displayed adequate fit statistics.

Differential Item functioning

The variables that were examined for the presence of DIF were age and gender. In addition, to explore the possibility of discrepancies in scoring between the two sites (Montreal vs. Toronto), study site was also tested for DIF. The presence of DIF was deemed to be present if analyses of variance were significant (Bonferroni-corrected; p < 0.0014). Several items displayed DIF (see Table 2). Two items displayed DIF by age (automomic dysreflexia/spasticity, cardiac disease/ECG). Of importance, several items also displayed DIF by site (Montreal and Toronto). These items are pulmonary disease, cardiac disease, cognitive impairments, albumin level, and hematocrit.

Table 2 Items displaying DIF according to person factors age and site

Reliability

Scale reliability of the SCIPUS was also examined using Rasch. In RUMM2030, the reliability index, also called the person separation index (PSI), is interpreted as a Cronbach’s alpha. Cronbach’s alpha is a measure of internal consistency, that is, how closely related a set of items is as a group. It is considered to be a measure of scale reliability. In this context, PSI indicates how well the items can discriminate persons with different levels of ability (PI risk) [21] with an estimate >0.8 deemed satisfactory [25]. The PSI for the remaining 14 items of the SCIPUS was 0.44, which is considered low. Therefore, the internal consistency is inadequate, indicating that not all of the items of the SCIPUS discriminate well between persons that have different levels of risk for developing a PI.

Targeting and content validity

Content validity is achieved when items are spread evenly along the measured latent trait: “risk of developing a PI” and cover a wide range of risk from the lowest to the highest possible risk (range from −3 to +3 logits) (See Fig. 1). The 14 remaining items of the SCIPUS range from approximately −3 to +2 logits, which is a slightly narrower than an ideal range, meaning that the SCIPUS does not cover the entire range of “risk of developing a PI”. Content validity is also demonstrated through proper targeting of the items and the absence of gaps along the latent trait continuum [26]. This type of evidence was investigated through the item-person map (Fig. 1). As depicted in Fig. 1, the items are not clustered in the same area but rather are spread out along the continuum. Spread indicates that the items are not redundant and measure different levels of risk. The items do spread further to the right than to the left, indicating some mis-targeting of the population under study. Because the item thresholds fail to spread to the left beyond −3, persons located below this value (those who are at lower risk) are not well targeted by the measure. Also, the mean location score obtained for the items is −1.165. Ideally, this should be closer to zero—the default value which should represent an “average risk”.

Fig. 1
figure 1

Item-person threshold distribution map of the SCIPUS (item 12 omitted). The horizontal axis, scaled in logits, represents the lowest risk of developing a PI at the left to highest risk at the right and the vertical axis denotes the proportion of subjects or items. The bars represent the distribution of subjects (upper half of graph) and items (lower half of graph) at each location. The item thresholds spread from approximately −3 to 2 logits

Unidimensionality and local independence

Unidimensionality and local independence were assessed through the principal component analysis of the residuals. The first component explained 15% of the variance, which may indicate the presence of more than one dimension. To further assess whether the scale was unidimensional, we performed a post-hoc test of unidimensionality. According to the Smith’s t-tests [27] 24 out of 883 t-tests (2.71%) showed significant differences in the estimates generated. Unidimensionality and local independence are ascertained if no more than 5% of the t-tests have values outside ±1.96 [28].

Proposing modifications for improvement

Following the initial Rasch modeling of the 14-item SCIPUS, further investigation was undertaken to determine if the SCIPUS could be modified to improve the fit to the model. First, item 7—Smoking was removed. In addition, item 6—Age was removed as it is not a latent trait in itself, so it is not considered amenable to being analyzed through an item response theory model. Indeed, a latent trait, in this context, should be a construct representing “qualities” or “states” of individuals that are not directly observed but rather inferred through questionnaires or tests.

These deletions produced a non-significant item-trait interaction (chi-square = 113.82; p = 0.10; df = 96), an indication that the measure now met this particular requirement of the Rasch model. Only items 2 (Mobility) and 1 (Activity) demonstrated a slight misfit to the model with fit residuals of 2.89 and 3.11, respectively (see Individual item fit of remaining 12 items of the SCIPUS—Table 3). Further examination of the metric properties of the SCIPUS performed on the 12 remaining items demonstrated similar validity characteristics in terms of content and construct validity and slightly higher reliability (PSI = 0.48) and the same items demonstrated DIF for age, gender, and site.

Table 3 Individual item fit of remaining 12 items of the SCIPUS

Discussion

The objective of this study was to assess further the metric properties of the SCIPUS in its current format, with a focus on validity using the partial credit Rasch measurement model. Results could guide modifications to potentially improve the instrument.

Rasch analysis demonstrated that the SCIPUS, in its current format does not meet the criteria required for true measurement and therefore the total score must be interpreted with caution when assessing the risk of developing a PI in a population of individuals with a SCI (non-traumatic and traumatic) during inpatient rehabilitation. First, item 13—nursing home or hospital cannot be used for persons that have been admitted to a rehabilitation center as it presents no variance and thus should not be used in this particular context.

Results indicate that the item “smoking” does not fit the Rasch model. In contrast, the remaining items display adequate fit. This may be due to the fact that it is not the smoking itself that puts a person at risk of developing a PI but rather its effect on cardiovascular and lung functions, among others, and those are assessed through other items present in the scale. The item “smoking” was therefore removed from further analyses thus improving the performance of the model considerably.

The spread of items along the measured latent trait (PI risk) indicated that the SCIPUS has a floor effect. Because the item thresholds fail to spread to the left (Fig. 1), persons with a lower risk of developing a PI are not well targeted by the measure and therefore their risk is not estimated with precision. Furthermore, DIF for the item cardiac disease differed depending on the age of the persons. This is important because it indicates that for a similar level of risk, persons should be scored differently for this item, depending on which group they belong to, and failure to do so may result in individuals being wrongly categorized. As a result, this could result in appropriate prevention measures not being put into place when required.

The DIF by site (Toronto and Montreal) also differed for several items (pulmonary disease, cardiac disease, cognitive impairments, low albumin level, and low hematocrit), which is important. This indicates that the criteria for the scoring options differs between sites or are not interpreted in the same way. For items that involve blood tests, the laboratories analyzing the results may have different guidelines and cut-offs. For more “subjective” items, which require interpretation by the assessor, there is the possibility that scoring options were interpreted differently at the two study sites. This could have implications for the generalizability of the SCIPUS in different contexts. Language may have been an additional issue as the SCIPUS was predominantly administered in French at the Montreal site. Therefore a thorough analysis of the translation of the scoring category descriptions should be considered.

Perhaps the most important aspect to consider is the low PSI and the inability of the items to distinguish between different levels of PI risk. This may be due to the fact that most of the items are dichotomous (i.e., they are scored 0 or 1) and therefore lack the precision required to differentiate between different levels of risk. This hypothesis is strengthened when one examines the frequency of response categories that were endorsed. For most items, the endorsement of response categories is highly skewed, meaning that items were scored the same for a very high percentage of the sample. In the authors’ opinion, it would be advisable to review the response options and consider increasing the number of scoring options for each item while at the same time improving the operationalization of their definitions. This conclusion is supported by a prior study from Mortensen and colleagues [16] that found that some items of the SCIPUS and SCIPUS-A are not operationalized well, and clearer response descriptions would aid clinicians [16] and potentially increase the scales’ reliability and validity.

Clinical recommendations

Despite the limitation caused by not being able to consider the weights of certain items of the scale, this analysis indicates that in its original form, the SCIPUS’ total score should be used cautiously, as Rasch analysis has demonstrated that the items which comprise the SCIPUS, as currently formatted, may not accurately determine PI risk in individuals with SCI participating in inpatient rehabilitation. This is in agreement with an earlier study by the authors, which found that the SCIPUS had poor specificity [17]. If the SCIPUS is used, we suggest clinicians emphasize the qualitative aspects of the items rather than relying on the scores as the individual items could potentially provide insight into factors contributing to PI. Furthermore, modifications to the current items or the replacement of certain items could potentially insure fit of the model and a true measurement of PI risk by the summation of the individual item scores. Addressing the DIF between sites and adding response options to some of the items could improve its performance given the low PSI. Following these changes, the scale will have to be validated by Rasch analysis as well as traditional metric standards of validity and reliability.

Data archiving

The datasets generated and/or analysed during the current study are available from the corresponding author on reasonable request.