Reliability and validity of a Central Kurdish version of the Dizziness Handicap Inventory

We cross-culturally adapted the Dizziness Handicap Inventory (DHI) into Central Kurdish dialect (DHI−CK) and verified its reliability and validity. A cross-sectional study was utilised to measure the impacts of vestibular disorders. Along with the DHI−CK, two comparators were introduced: the Visual Analogue Scale and the Clinical Test of Sensory Interaction and Balance. External and internal reliability were tested with intraclass correlation coefficient (ICC) and Cronbach’s alpha/composite reliability, respectively. Patients (n = 301; mean age = 44.5 ± 15.2 years; 59.8% women) presenting with vestibular symptoms for at least 30 days who were diagnosed with a vestibular disorder and healthy participants (n = 43; mean age = 42 ± 17.9 years; 62.8% women) (N = 344). The DHI−CK and its three sub-scales—Physical, Emotional, Functional—exhibited good to excellent external reliability: ICCs in the test-retest were 0.93, 0.88, 0.91, and 0.92, respectively. Cronbach’s alphas were 0.87, 0.71, 0.75, and 0.73, respectively. Convergent validity was supported by Spearman’s correlations between the DHI−CK and the comparators. The receiver operating characteristic curve analysis confirmed discriminating validity. The DHI−CK was cross-culturally validated. It is a reliable and valid tool that can be used by clinicians and researchers to quantify vestibular disorder outcomes in Kurdish-speaking populations.

External reliability. The four scales of the instrument revealed good to excellent external reliability; the intraclass correlation coefficients (ICC) of the test-retest reliability for DHI-P, DHI-E, DHI-F, and DHI-T were 0.88, 0.91, 0.92, and 0.93 respectively. The total scores of both comparators-clinical test of sensory interaction and balance (CTSIB-T) and Visual Analogue Scale (VAS-T)-also exhibited excellent reliability: 0.91 and 0.95, respectively ( Table 2).
Internal consistency reliability. Cronbach's alpha (α) s of the DHI-P, DHI-E, DHI-F, and DHI-T were 0.71, 0.75, 0.73, and 0.87, respectively. The average inter-item correlations (AIC) of all scales were satisfactory as they were located within the acceptable range of 0.2-0.5. The corrected item-total correlations (CI-TC) of the 25 items in all scales showed acceptable values; nearly all the 25 items in the DHI-T acquired values above 0.3 (item-F7 was 0.29). Both composite reliability (rhoC) and reliability of the partial least squares (rhoA) in the three sub-scales were >0.7 (Table 3).
We estimated the α if item deleted (AIID); that is, the resulting αs of the sub-scales and the total scale when any item was deleted, no inflation was noticed in these αs.
In non-normal item-E15, the frequency of the 301 responses was as follows: yes = 16, sometimes = 11, and no = 274. The standardised values of each of the records were <3.29 except for those of yes-response records Discriminating validity. In patient/healthy groups, the areas under the receiver operating characteristic (ROC) curve (AUC) of the scores DHI-P, DHI-E, DHI-F, and DHI-T were 0.94, 0.98, 0.93, and 0.98 respectively; however, in patients' subgroups were 0.54, 0.54, 0.55, and 0.55 respectively (Table 5 and Fig. 2). Moreover, the Mann-Whitney U test retained the null hypothesis when the scores of patients' subgroups were compared with each other (ps > 0.05); however, it was rejected when the scores of all patients and their subgroups were compared with those of the healthy group (ps < 0.05) and distinct distributions and shapes in all sub-scales and the total scale were revealed (Fig. 3).

Discussion
Validated PROMs are of utmost importance when examining vestibular disorder; unfortunately, to date, there has been no such instrument in Kurdish that can quantify the impact of vestibular disorders. Accordingly, using a focus group and key recommendations, we cross-culturally adapted the DHI into Central Kurdish.
Convincing a patient to participate in the target population was not difficult-meticulous explanation of the potential benefits of this study by the authors and the interviewers (raters) likely increased the participation rate.

DHI-P DHI-E DHI-F DHI-T CTSIB-T VAS-T
ICC a n ICC a n ICC a n ICC a n ICC a N ICC a n www.nature.com/scientificreports www.nature.com/scientificreports/ However, maintaining participants' motivation was challenging. We occasionally noticed that, after a few responses, participants' interest declined, which was resolved by changing from self-administered to interviewer-administered. Hence, employing interviewers was essential. Interviewers were instructed to delineate bias scores in cases of unreliable respondents, prestige-bias (where the patient reports what s/he wants instead of what s/he feels), and halo-effects (where the patient overgeneralises the responses in either a positive or negative direction) 15 .

DHI-CK (Kurdish)
Original a German b n = 301 n = 106 n = 127 Corrected item-total correlation  www.nature.com/scientificreports www.nature.com/scientificreports/ Dizziness is a broad term, and it might be of non-vestibular origin 16 ; however, the DHI was originally developed to evaluate the consequences of vestibular disorders. Therefore, to ensure sample representativeness, we only included cases with a vestibular origin. Additionally, patients were of various ages from diverse settings.

DHI-P DHI-E DHI-F DHI-T DHI-T DHI-T
The DHI-CK and its three sub-scales showed good to excellent external reliability. The present study almost replicated the test-retest reliability of the original scale 9 , and other translated versions 13,[17][18][19] . Further, the internal consistency was broadly examined through most of the recommended criteria, and the DHI-CK and its three sub-scales had acceptable to good reliability. The CI-TC values for each item in the DHI-CK were compared with that of the original and German version 19 , which also revealed internal consistency (see Table 3). However, our cut-off point of 0.2 for the CI-TC (the same used for the German version) varied from those reported (e.g. 0.3, 0.4, and 0.5) by other guidelines 20,21 . If we consider this discrepancy and recall that the DHI was originally developed based on the CI-TC, one could argue about the structure of this popular PROM. In other words, factor analysis is superior to CI-TC when examining the structural organisation of sub-scales. This was tested by both Kurre and colleagues 22 and Tamber and colleagues 18 ; when they subjected DHI to a structural analysis, structures that differed from those of the original were found.
The non-normal E15 item (i.e. 'are you afraid people may think you are intoxicated?') and its effects on the analysis were thoroughly investigated. Concerning bias, a score related to alcohol consumption in a semi-conservative population (Kurdish) is a matter of debate. The possibility of prestige-bias in no-response records was considered, because this response is socially acceptable; likely the yes-response (potential outliers) provided legitimate data. Accordingly, it would be illogical to remove genuine data; further, deletion of these outliers makes the sample less representative. Consequently, to examine the effect of these aberrant 16 cases, instead of deletion, we analysed the data with and without item-E15. We planned to permanently remove the item from the DHI-CK if there was substantial variation between the two analyses; however, no significant differences were found; therefore, the item was retained.  Table 5. The ability of the scales to discriminate between different groups and subgroups using receiver operating characteristic curve. Note: a Subgroups categorised based on duration of vestibular symptoms in months: 1−6 and 7−180 months for subgroups 1 and 2, respectively; b Subgroup-1 and subgroup-2 defined as case and control, respectively. Abbreviations: AUC, Area Under the receiver operating characteristic Curve; DHI-P/E/F/T, Dizziness Handicap Inventory-Physical/Emotional/Functional/Total, respectively. www.nature.com/scientificreports www.nature.com/scientificreports/ Our hypotheses regarding convergent validity were supported; an adequate positive correlation was found between the DHI-T and VAS-T, and a similar association was seen in the German version 19 . Furthermore, the negative and moderate range of correlations between the related sub-scales (DHI-P and DHI-F) and the objective score (CTSIB-T) in this study were similarly generated by both Kurre and colleagues 19 and Nikitas and colleagues 23 , by correlating distinct types of objective scores with the DHI sub-scales.
This study revealed that the duration of the symptoms did not significantly affect the DHI scores; the instrument could not discriminate subgroups with different elapsed time for symptoms, confirming that the scores are collective measures. However, the ROC curve analysis and default Mann-Whitney U test confirmed that the instrument can effectively discriminate between healthy individuals and patients with vestibular disorders.
This study had some limitations. First, there were no validated PROMs for vestibular specialty in Kurdish to be used as a comparator in this study. Second, the C/12 was back-translated only once. Lastly, the least time interval in reliability tests was reduced to one day because of patients' housing situation. We noticed that long intervals are not suitable for reproducibility in patients with vestibular disorders because symptoms can change dramatically under the effect of central compensation; therefore, to avoid recall bias, it is better to use other measures, such as those that we mentioned in our Methods section.
Despite these limitations, we believe that this work provides an essential tool that can be used by clinicians and researchers when examining Kurdish-speaking populations with such demanding disorders; moreover, this tool can be used as a cornerstone and a comparator when validating other similar PROMs in the future.
The Kurdish medical community was deprived from any validated PROM in the field of vestibular disorders. Consequently, we cross-culturally adapted the DHI-CK and verified its external and internal reliability. We also established that it had acceptable convergent and discriminating validity. As an effective PROM, the DHI-CK can be utilised by clinicians and researchers to quantify the impacts of vestibular disorders in pre and post-therapeutic interventions. Further research should assess its internal dimensions, responsiveness, and interpretability.

ethics.
The present study commenced after obtaining approval (no. 43B) from the Ethical Review Board of the College of Medicine, Sulaimani University, Sulaimani, Kurdistan Region, Iraq. This study was conducted in accordance with the 2008 Declaration of Helsinki. Participants who met the inclusion criteria were enrolled after providing informed, written consent.

Cross-cultural adaptation.
Steps in this process were implemented per regulations provided by two related guidelines 8,24 . Initial stage. The initial stage comprised three steps: (1) Endorsement for cross-cultural adaptation to Kurdish was granted from professor Jacobson, the original developer 9 . (2) We ensured that translated questions were understandable. Words or expressions that are not familiar must be substituted by the most appropriate ones without losing their meaning. www.nature.com/scientificreports www.nature.com/scientificreports/ (3) We implemented necessary focus-group sessions (consisting of 7 otolaryngologists) according to specific guidelines 25,26 . Translation stage. The DHI was translated from English to Central Kurdish twice: the first copy (C1) by an expert otolaryngologist and the second (C2) by a professional bilingual translator. Both were synthesised to form C1/2. During synthesis, vague words were clarified, and formal expressions were popularised (e.g. 'dancing' was changed to 'shayi' , which represents a traditional celebration; and the translated word for 'embarrassed' was replaced by a more popular Arabic word). Then, the C1/2 was back-translated to English and compared with the original version-which revealed they were congruous-followed by minor editing for the pre-final copy. Next, a pilot study was conducted with 12 educated patients with good linguistic skills from the target population to clarify the questions, and the focus group was used to determine face and content validity. Eventually, after proofreading, the final copy was created (Supplementary Table S3), and the procedure was reported to the institute.

Design and participants.
A cross-sectional survey was utilised to perform the study; however, for the reliability subgroup, the survey was converted to a short-term longitudinal.
Enrolment occurred in two well-resourced tertiary clinics that cover a considerable amount of the Sulaimani governorate in Iraq. Participants' cognitive state was assessed through a general clinical examination; additionally, for older participants (aged >65 years), the Mini-Mental State Examination was also utilised. Inclusion criteria were as follows: aged 18 to 79 years, having vestibular symptoms for at least 30 days, received an objective diagnosis of a vestibular disorder, and passing the cognitive assessment. Participants who could not answer or were unable to perform objective tests and those with associated non-vestibular pathology were excluded from analyses. Because of the duration of vestibular symptoms, included patients were categorised into two subgroups: 1 (symptoms for 1-6 months) and 2 (symptoms for 7−180 months).
The sample size was determined based on the participant-to-variable ratio of at least 10 participants for each item 27 . Accordingly, we estimated that 301 patients would be sufficient. From March 2017 to June 2018, patients were included in the study.
The DHI is a self-administered tool; therefore, the interviewer's role was minimal 28 ; however, because of the inclusion of illiterate participants, the survey involved two interviewers with proximate abilities. The job of the interviewers was to introduce the task, provide any necessary explanations, and/or read the items to participants who could not read.
Patients in the reliability subgroup (n = 70), were rated on two occasions. The interval between occasions was 1 to 5 days for both PROMs; while, for the objective test-CTSIB-the interval was 1 to 2 hours (to avoid the effects of in-between rehabilitations and/or central adaptation). The time of the second rating was adjusted by the interviewers per patients' availability.
Randomisation process. While patients were receiving the results of their tests or rehabilitation treatments, they were invited to participate. Those who consented and met the inclusion criteria were systematically numbered. The first patient was selected randomly followed by a constant interval selection.

Measurement errors and recall bias.
Steps were taken to minimise measurement errors and recall bias such as changing the sequence of the questions, applying a similar setting, excluding unstable patients, and not interfering with the patients during response selection.
Comparator instruments. In addition to the DHI-CK, two other outcome measures were introduced.

VAS.
The VAS has been widely used as an outcome measure. de Boer and colleagues 29 concluded that the VAS has good psychometric properties. Because of the lack of any validated PROMs in Kurdish that can measure the same construct, we utilised the VAS as a comparator. A printed scale with one-hundred fractions from zero to 100 was used: in which, zero denotes no-handicap and 100 denotes maximum-handicap. Patients were asked to score his/her overall resultant handicap (VAS-T) since vestibular symptom onset.

CTSIB.
Participants were asked to maintain balance for three trials in six conditions. They were standing with both legs and feet close together, wearing socks, and looking forward with each palm over the corresponding shoulder. The six conditions were as follows: (1)  Any trial was completed if the participant could or could not maintain his/her balance for 1 minute, moving palm or foot, loss of balance, seeking assistance, or opening eyes in the eyes-closed condition. Second and/or third trials were only needed if the participant could not complete the 1 minute in the preceding trial. For each condition, the sum was calculated by dividing the total seconds for available trial/s on number of trial/s for that condition, while the CTSIB-T was the total of all six conditions 30 . Hypotheses. DHI-CK and the designed VAS for this study are subjective scores; they are cumulative measures for the same construct; i.e. the global handicap induced by vestibular disorders from the onset of symptoms to the time of rating. However, CTSIB-T is an objective score that measures the steadiness at a specific time; i.e. the time of testing 31 . Appropriately, to assess the concept and the discriminating ability of the instrument on the base of the duration (elapsed time from the beginning of the symptoms to the time of rating), we categorised our patients into two subgroups and devised the following three hypotheses: www.nature.com/scientificreports www.nature.com/scientificreports/ (1) In all patients, the positive correlation between the DHI-T and VAS-T would be adequate; (2) In all patients, the negative correlation between CTSIB-T with both DHI-P and DHI-F would be moderate because they are measuring the steadiness in two distinct ways (objective and subjective); and (3) The distribution of the four DHI scores (three sub-scales and total) would be the same across patients' subgroups because the scores are a cumulative measure and are not related to the amount of time elapsed; however, it would differ between the all patients/subgroups and the healthy group because the tool was originally designed to measure the impacts of vestibular disorders. statistical analyses. Data screening. Records with missing values were pair-wise excluded. Ceiling and floor effects were absent in the three outcome measures. Considering our sample size, an absolute value for standardised Z-score greater than 3.29 32 and absolute values greater than 2 and 7 for skewness and kurtosis 33 respectively, were considered as non-normal; moreover, a chi-square critical value of <0.001 in Mahalanobis distance was considered a multivariate outlier 34 . The scores of 24 questions and the four scales were distributed normally, as none of them exceeded these cut-off points. However, the normality was violated by Item-E15, in which, absolute skewness and kurtosis were 3.32 and 9.7, respectively (Supplementary Table S1), and Z-scores of each of the 16 cases were 3.88 (>3.29); therefore, they were considered as a potential univariate outlier. Necessarily, we tested the multivariate distribution for all 25-items using IBM SPSS macro from DeCarlo 35 , which revealed asymmetry and significant p-values for both skewness and kurtosis (Mardia's test). Non-normality is expected in ordinal data such as Likert-items 36 ; consequently, we followed Feng et al. 37 and utilised non-parametric tests instead of log-transformation.
External reliability. Because of the involvement of two specific interviewers, the choice of the model, type, and the definition of ICC were two-way mixed-effect, mean of k interviewers, and absolute agreement, respectively. Referenced values of <0.5, from 0.5 to 0.75, from 0.75 to 0.90, and >0.90 indicate poor, moderate, good, and excellent reliability, respectively 28 . Convergent validity. The associations between DHI-CK and the comparators were examined via Spearman's robust rank correlation 36,41 . Referenced values for the associations were <0.3, >0.3 < 0.5, >0.5 < 0.7, and >0.7 for weak, moderate, adequate, and high correlations, respectively 42,43 .
Discriminating validity. The ability of the four scales to discriminate between different groups and subgroups; that is, patient/healthy groups and the patients' subgroups were examined by employing the following two methods: (1) The ROC curve. Concerning the AUC, we followed Hosmer and colleagues 44 , with referenced values as follows: AUC = 0.5, 0.5 < AUC < 0.7, 0.7 ≤ AUC < 0.8, 0.8 ≤ AUC < 0.9, and AUC > 0.9 suggested no, poor, acceptable, excellent, and outstanding discrimination, respectively. The Youden indices and their associated criterion values for the scales were estimated. (2) With a significance level of 5%, we utilised the Mann-Whitney U test to examine discriminating validity.
Since the shape and the distribution of the scales between the patient and the healthy groups were dissimilar, we compared mean ranks instead of medians; however, for patients' subgroups, we compared the medians because the shapes were similar 41 .
Software. SPSS 21 (IBM, Armonk, NY, USA) was used for all steps of the analysis except for rhoC and rhoA, which were determined by SmartPLS 3 45 . Data related to the ROC curve analysis (