Psychometric assessment and validation of the dysphagia severity rating scale in stroke patients

Post stroke dysphagia (PSD) is common and associated with poor outcome. The Dysphagia Severity Rating Scale (DSRS), which grades how severe dysphagia is based on fluid and diet modification and supervision requirements for feeding, is used for clinical research but has limited published validation information. Multiple approaches were taken to validate the DSRS, including concurrent- and predictive criterion validity, internal consistency, inter- and intra-rater reliability and sensitivity to change. This was done using data from four studies involving pharyngeal electrical stimulation in acute stroke patients with dysphagia, an individual patient data meta-analysis and unpublished studies (NCT03499574, NCT03700853). In addition, consensual- and content validity and the Minimal Clinically Important Difference (MCID) were assessed using anonymous surveys sent to UK-based Speech and Language Therapists (SLTs). Scores for consensual validity were mostly moderate (62.5–78%) to high or excellent (89–100%) for most scenarios. All but two assessments of content validity were excellent. In concurrent criterion validity assessments, DSRS was most closely associated with measures of radiological aspiration (penetration aspiration scale, Spearman rank rs = 0.49, p < 0.001) and swallowing (functional oral intake scale, FOIS, rs = −0.96, p < 0.001); weaker but statistically significant associations were seen with impairment, disability and dependency. A similar pattern of relationships was seen for predictive criterion validity. Internal consistency (Cronbach’s alpha) was either “good” or “excellent”. Intra and inter-rater reliability were largely “excellent” (intraclass correlation >0.90). DSRS was sensitive to positive change during recovery (medians: 7, 4 and 1 at baseline and 2 and 13 weeks respectively) and in response to an intervention, pharyngeal electrical stimulation, in a published meta-analysis. The MCID was 1.0 and DSRS and FOIS scores may be estimated from each other. The DSRS appears to be a valid tool for grading the severity of swallowing impairment in patients with post stroke dysphagia and is appropriate for use in clinical research and clinical service delivery.

dysphagia based on oral intake, such as the functional oral intake scale (FOIS) 4 , and the dysphagia severity rating scale (DSRS) 5 .
The DSRS is a clinician rated scale that was developed from the dysphagia outcome and severity scale (DOSS) 6 . It grades how severe clinical dysphagia is, by quantifying how much modification is required to fluids and diet, as well as level of supervision, for safe oral intake. The DSRS comprises three subscales that are totalled to give a score ranging from 0 (best) to 12 (worst). The subscales are five-level ordinal assessments of fluid and dietary intake and supervision; each ranges from normal (score 0) to no intake (4) ( Table 1).
As with the DOSS, which ranked independence levels according to the Functional Independence Measures (FIM) model and was linked to severity 7 , supervision on the DSRS was also divided into independence levels; however, the DSRS does not require a VFS to be performed. To date, the DSRS has been used in several published trials of PSD 5,[8][9][10] . The DSRS is copyright free and open access for research use. The aim of the present study was to test and describe the validity of the DSRS in patients with recent stroke. Consensual, content, concurrent criterion, and predictive criterion validity, and internal consistency, inter-and intra-rater reliability, sensitivity to change, and minimal clinically important difference were each assessed. Additionally the relationship with the FOIS, a validated dysphagia scale 4 , was examined.

Approvals, informed consent and ethical approval. This validation study of the DSRS used a mix of
prospectively-collected data from completed clinical studies and newly collected prospective data; in each case, non-attributable anonymised data were analysed. The completed trials each had national ethics approvals and patients (or surrogates) had given written informed consent, this covering subsequent data analyses; an individual patient data metanalysis has already been published using the three pilot trials. For survey data, the University of Nottingham Faculty of Medicine Research Ethics Committee assessed that a full review by the committee was not indicated as the requests were distributed via professional networks; participation in the surveys was voluntary and anonymous and all data collection was performed in accordance with relevant guidelines and regulations set out by the University. Clinical audit data were collected by members of the clinical team and did not need research ethics approval. The authors will share a subset of anonymised individual patient trial data with the international VISTA Collaboration 11 .
Validation. Multiple approaches were taken to validate the DSRS including determining consensual, content, concurrent criterion and predictive criterion validity 12 . Additionally, internal consistency, inter-and intra-rater reliability, sensitivity to change, and minimal clinically important difference (MCID) were determined. Data sources. Validation assessments used data from all trials and unpublished studies that are currently known to have used the DSRS as an outcome. These included raw data from published trials of pharyngeal electrical stimulation (PES) 5,[8][9][10] , an individual patient data meta-analysis of the first three of these PES trials 13 , and unpublished studies (NCT03499574, NCT03700853). All studies involved patients with acute and/or sub-acute stroke.
Consensual and content validity were assessed from an anonymous survey sent to 20 UK-based Speech and Language Therapist (SLT) experts with experience of working with adults with acquired dysphagia. Relevant additional information was provided to the respondents regarding the background and purpose of the scale, and what patient group it was designed for. Similarly, to establish the MCID, an anonymous survey was distributed to a number of UK professional networks of SLTs with experience of working with adults with acquired dysphagia.
Consensual validity. This is the validity of a test determined by its general acceptance in the community of users, or by the number of users who judge it to be valid. Consensual validity was assessed by asking respondents to rate 5 scenarios using the DSRS, as recently used in validation of the International Dysphagia Diet Standardisation Initiative (IDDSI) functional diet scale 14 . Scenarios required respondents to rate recommendations of full amounts of oral intake, minimal and consistent oral trials, liquid only diets and accompanying levels of supervision. Respondents were asked to provide additional comments at the end of the survey. Excellent or good agreement were considered acceptable. content validity. This refers to the extent that a test includes all aspects of its construct, including relevance and comprehensiveness. Relevance was assessed using the content validity index (CVI), an indicator of inter-rater agreement that asks experts to appraise how relevant items are; 15,16 it is particularly appropriate to use on instruments that have scales with multiple items 15,16 . Experts considered the relevancy (score 1 for not relevant, to 4 Inter/intra-rater reliability. These are the degree of agreement among raters, and among repeated measurements by one rater, respectively. Inter-rater and intra-rater reliability was performed by JB and AH using the same audit data as used for internal consistency. Both measures of reliability were assessed using the inter-class correlation (ICC).
Sensitivity to change. This is also known as responsiveness 21 and refers to how well an instrument identifies longitudinal changes, in a proportionate manner 16 . Changes in the DSRS during the rehabilitation phase after stroke, i.e., from study baseline to final follow-up, were assessed using data from the STEPS trial.

Minimal clinically important difference (MCID).
The MCID is the minimum difference in a score that is considered valuable and changes patient management 22 . MCID was assessed in three different ways through assessment of statistical distribution (both half standard deviation and standard error of mean), anchor, and consensus through a survey [23][24][25] . Data for analysis of statistical distribution and anchor methods came from the STEPS trial and an individual patient data meta-analysis of three pilot trials of PES 9,13 . The survey involved UK-based SLTs. The survey was sent to a number of professional networks and it was up to the discretion of the network administrators whether the survey was forwarded.

Relationship between DSRS and FOIS.
The DSRS and FOIS measure overlapping aspects of clinical dysphagia although they have an opposite direction of severity. Their relationship and interconversion were determined through mapping equivalent levels and using data from studies that measured both in parallel. Where a range of values was estimated the median of these is given.

Statistical analyses.
In addition to the specific analyses detailed above, standard approaches were used to present results as number (%), median [interquartile range, IQR] or mean (standard deviation, SD).

Results
Trial individual patient data.  Table I), note for all supplementary tables, please see online resource. The mean age was 71 (SD 12) years with 109 (38%) female, mean onset to randomisation of 21 (SD 17) days; the most common clinical syndrome was partial-anterior circulation, 92 (43%) and just 3 (1%) patients had a posterior syndrome; 211 (85%) participants had an ischaemic stroke and 38 (15%) an intracerebral haemorrhage. A ceiling effect was noted at baseline with 139 (48%) patients having a maximum DSRS score of 12. Increasing dysphagia impairment, assessed using the DSRS, was significantly associated with time from onset to randomisation, worse neurological deficit (NIHSS), stroke type, dependency (modified Rankin scale), disability (Barthel index), swallow screening (component score on TOR-BSST), radiological aspiration (PAS) and non-oral feeding state (Supplementary Table I).
Consensual validity. As Speech and Language Therapists (SLTs) are the primary clinicians who treat dysphagia in the UK, anonymous surveys were sent to 20 invited UK based SLT clinicians. Between eight and ten respondents rated each scenario. Seventy percent of respondents had 10+ years' experience. The areas of expertise of the respondents was: stroke (5), head and neck (2), dementia (1), other (2). Consensus was excellent (100%) for recommendations of full oral intake; moderate (78%) to low (56%) for minimal oral trials of liquids (e.g. 5 sips) and solids (e.g. 5 tsps.) respectively, and high (89%) and moderate (78%) for consistent oral trials of liquids (e.g. 100 ml) and solids (e.g. half portions of diet) respectively (Supplementary Table II www.nature.com/scientificreports www.nature.com/scientificreports/ intake, high (89%) for minimal oral trials and moderate (67%) for consistent oral trials (Supplementary Table II). Respondents' comments requested clarification on how to score consistent amounts of oral trials and liquid diets. content validity. Ten of the 20 invited UK-based SLTs responded to the anonymous survey. This is an acceptable number of expert views for undertaking content validation of an instrument 15 . All but two components of the DSRS sub-scales had "excellent" relevance (I-CVI > 0.90); "pudding consistency" was good and "selected textures" was fair (Table 2). At a scale level, both the fluid and food scale achieved an S-CVI/Ave rating of 0.84 (good) and the supervision scale a rating of 0.96 (excellent). Expert feedback regarding wording and comprehensiveness are given in Supplementary Table III; many of these related to the lack of mention of IDDSI 26 in the DSRS definitions, a point we address in the Discussion.

Concurrent criterion validity.
Data were available for all four trials 5,[8][9][10] . In the largest (STEPS), DSRS at baseline and weeks 2 and 13 was associated significantly and in appropriate directions with measures, at the same time points of aspiration (PAS using VFS), swallowing (TOR-BSST), disability (Barthel index) and dependency (modified Rankin scale) ( Table 3). At 2 weeks post randomisation, DSRS was also associated with impairment (NIHSS). DSRS was not related to quality of life measures at 13 weeks post randomisation. The three sub-scale components of the DSRS (fluids, diet and supervision) were also each associated significantly with aspiration at all three time points. Similar magnitudes of associations were seen in the smaller studies of Jayasekeran 5 and Vasant 8 (Table 3) although associations did not always reach significance in these studies 8 . Overall, associations were stronger between DSRS and measures of swallowing and aspiration then with global measures of impairment (NIHSS), disability (BI) and dependency (mRS).
DSRS was strongly negatively correlated with FOIS at day 2 and week 13 in the PHAST-TRAC trial (Table 3); the association could not be performed at baseline since all participants had a DSRS of 12/FOIS of 1 as part of the trial's inclusion criteria 27 . predictive criterion validity. Using data from the STEPS trial, baseline DSRS was associated with radiological aspiration (VFS PAS) at 2 and 13 weeks; and swallowing (TOR-BSST), disability (BI) and dependency (mRS) at 2 weeks (Table 4). There was no association with impairment (NIHSS), or quality of life (EQ-5D-3L, EQ-VAS). The three DSRS sub-scale components (fluids, diet and supervision) at baseline were also each associated significantly with radiological aspiration at 2 and 13 weeks.
Associations between baseline DSRS and post-treatment measures in the trials of Jayasekeran and Vasant were not statistically significant. It was not possible to assess the relationship between baseline DSRS and post treatment FOIS in the PHAST-TRAC trial since all participants had a baseline DSRS score of 12 27 . internal consistency. The interrelation between the scores from the three subscales, at various timepoints, were assessed using Cronbach's alpha 20 using data from STEPS, Vasant and PHAST-TRAC trials [8][9][10] . Internal consistency was "Good" at baseline, varied between "Good" and "Excellent" over the first two weeks, and "Excellent" at 12 weeks (Supplementary Table IV). Similarly, audit of clinical data by JB and AH revealed "Excellent" consistency between the subscales (Supplementary Table V www.nature.com/scientificreports www.nature.com/scientificreports/ Inter/intra-rater reliability. DSRS was scored in 31-58 hospitalised stroke patients by JB and AH. The inter-rater reliability was "Excellent" for DSRS with intra-class correlation (ICC) = 0.955 (95% confidence intervals 0.925, 0.973); similarly, the intra-rater reliability was "Excellent" (Table 5). Assessments within the subscale were mostly excellent with one good and one moderate result.     www.nature.com/scientificreports www.nature.com/scientificreports/ Sensitivity to change. DSRS scores were sensitive to spontaneous recovery for patients with acute/subacute PSD, declining during follow-up in STEPS with modal values of 12, 3 and 0 at weeks 0 (baseline), 2 and 13 respectively (Fig. 1). Similarly, the median (7, 4, 1) and mean (7.6, 4.9, 2.7) values declined at the same timepoints. As with VFS-PAS, DSRS was sensitive to treatment with pharyngeal electrical stimulation in a meta-analysis of three pilot trials being 1.7 points lower (p = 0.040) in the PES group as compared with the control group 13 . In contrast, the STEPS trial was neutral for the effect of PES on VFS-PAS and there was no difference in DSRS scores between treatment groups 9 .

Minimal clinically important difference (MCID). The survey was based on 84 responses from UK based
SLTs, the majority of whom had more than 10 years' experience. It was not possible to estimate the number that received the survey therefore response rate could not be calculated. MCID varied between 0.3 and 2.5 with all three approaches -statistical, anchor and survey -identifying a MCID of 1.0 as being important (Supplementary  Table VI). Table VII); however, some combinations of DSRS subscale scores are incongruent from a clinical perspective (e.g. use of thickened fluids when taking a normal diet) and so these have no equivalent FOIS value. Conversely, DSRS could be estimated from FOIS although in most cases it was not possible to determine subscale results since the subscales of supervision and fluids above level 3 are not scored on the FOIS (Supplementary Table VIII).

Relationship between DSRS and FOIS. FOIS could be extrapolated from DSRS scores (Supplementary
The PHAST-TRAC trial recorded both DSRS and FOIS at multiple post-randomisation timepoints (days 2, 4, 6, 8, 10, 30 and 90) 10 . The frequency of paired scorings is shown in Supplementary Table IX. The inverse nature of DSRS and FOIS is noted and percentages match the estimated equivalents in Supplementary Tables VII and VIII.

Discussion
This comprehensive assessment of the DSRS suggests that it is a valid tool for grading dysphagia severity (based on oral intake and supervision requirements) in patients with post-stroke dysphagia. Using data from four randomised controlled trials and 2 surveys, the DSRS was found to exhibit consensual validity, content validity, concurrent criterion validity, predictive criterion validity and internal consistency. Once operationalisation of scoring for certain feeding scenarios was undertaken, inter-and intra-rater reliability were "excellent" when used in a clinical audit, and the minimal clinically important difference approximated to 1 unit irrespective of the method of estimation. The DSRS was sensitive to change during the natural resolution of dysphagia seen through the sub-acute and rehabilitation phases after stroke, and in response to treatment with pharyngeal electrical stimulation in some trials. The intrinsic relationship between DSRS and FOIS allowed these two dysphagia scales to be mapped to each other.
The main strength of this study is the large number and variety of detailed validations performed. We also provide data on minimal clinical important difference and a means for interconverting the DSRS and FOIS. Second, much data came from two phase III trials (STEPS, PHAST-TRAC) rather than just a number of smaller studies. Third, patients with a range of post-stroke severity were included in these validations, with mild-to-moderate patients coming from three trials 5,8,9 and more severe ones from one 10 . Fourth, a large amount of clinical and radiological outcome data were available. This showed that overall, the DSRS was highly correlated with another clinical measure of dysphagia severity (FOIS). Measures of aspiration (VFS-PAS) and swallowing (TOR-BSST)

Comparison Scale ICC Interpretation
Inter-rater  www.nature.com/scientificreports www.nature.com/scientificreports/ were more strongly correlated than global measures of impairment, disability, and dependency (although these still showed some significant correlations) and (perhaps surprisingly) the DSRS was not correlated with a generic health status measure of quality of life.
There are a number of caveats to the study. First, although all trial protocols gave some guidance on how to use the DSRS, it was not the primary outcome measure in any study and was largely done according to local practice. Hence, the DSRS scores, whilst prospectively collected, are potentially less accurate than could be achieved with formal training and this was reflected in the consensual validity exercise and respondents' accompanying comments. In particular, there was less consensus for scoring patients on oral trials and liquid diets, as noted previously 14 . There was also less consensus on assigning supervision scores for patients on consistent amounts of oral trials, i.e. respondents found it easier to score supervision for patients either on full oral intake or limited trials. It is important that raters routinely using the DSRS clearly specify supervision level when making recommendations following the clinical bedside assessment. In the updated version of the DSRS (in Table 6), we provide rules for scoring supervision, including assigning diet, fluid and supervision scores for oral trials.
Second, the DSRS was devised and first used in 2010 and so antedates the 2017 IDDSI scale for determining levels of fluid thickness and modified food textures 26 . Further, DSRS measures different domains from IDDSI. Nevertheless, some comments by respondents in our assessment of content validity commented on the fact that the DSRS does not contain IDDSI terminology regarding wording and comprehensiveness. Going forward, we have proposed a redefinition of the DSRS to reflect IDDSI descriptors (Table 6) and plan to validate this updated scale in due course. Third, although the associations between DSRS and other radiological and clinical measures in the trials of Jayasekeran and Vasant 5,8 were similar in magnitude to those seen in the STEPS trial, most were statistically non-significant due to their much smaller sample size and so reduced statistical power. This emphasises the importance of having large data sets when performing validation studies of clinical scales. Last, the distribution of DSRS will depend on the population of patients being studied and timing after stroke, and ceiling and floor effects are present at different times after stroke; for example in STEPS, one third of participants had a maximum score of 12 at baseline (reflecting the trial's inclusion criteria) and a minimum score of zero 13 weeks later after natural resolution of dysphagia; this situation is analogous with other scales used in stroke, e.g. the Barthel Index 28 .
In summary, this study has shown that the 12-level DSRS is robust in terms of consensual, content, concurrent criterion and predictive criterion validity. Further, it shows "good-to-excellent" internal consistency, "excellent" inter-and intra-rater reliability, is sensitive to natural and therapeutic change, and has a minimal clinically    25 . DSRS supervision score 3 is always chosen when a patient is on limited or consistent oral trials and still requires NG/ PEG tube. Oral trials are scored from the fluid and diet subscales (i.e. 3 onwards) and can be either trials of food or fluid or trials of food and fluids. (2020) 10:7268 | https://doi.org/10.1038/s41598-020-64208-9 www.nature.com/scientificreports www.nature.com/scientificreports/ important difference of 1 point. However, distribution of scores will depend on patient population and time post-onset. Specific guidance for accurate use of the DSRS is provided in the updated version, which includes corporation of the new IDDSI descriptors. Overall, our results suggest the DSRS is a valid tool for grading the severity of dysphagia in stroke; its ease of use make it relevant for use in clinical service delivery and clinical trials to define baseline dysphagia severity and assess the effect of natural history or therapeutic change.