Introduction

Cataract surgery is one of the most frequently undertaken surgical procedures globally.1, 2 Traditionally, monocular visual acuity has been used to assess pre-operative need for surgery and post-operative success. The inadequacies of this approach have been widely recognised3 and in recent times patient reported outcome measures (PROMs) have attracted greater emphasis with a plethora of instruments being offered for use in cataract.3, 4 Despite the existence of available questionnaires there have been recent high level calls for better PROM instruments for cataract surgery in the NHS,5 including a 2017 high priority research recommendation from the National Institute for Health and Care Excellence (NICE).6 Early instruments were developed using Classical Test Theory, with many having subsequently been re-evaluated using modern item-response-based statistical techniques, in particular, Rasch analysis.4 This approach to vision-related self-report questionnaires provides for development of a unidimensional instrument capable of measuring an underlying ‘latent trait’ of visual difficulties. Rasch analysis allows sets of questions to be analysed to reveal whether a single or multiple measurement constructs are being addressed by the questions. Previous studies have adopted an approach where items from existing questionnaires are grouped into unidimensional subscales, each of which measures a slightly different construct.7, 8 For a valid assessment of dimensionality a certain number of items are required, typically around 10 questions being deemed sufficient, although as few as 3 or 4 questions have been analysed in this way to confirm or refute unidimensionality.7, 9 Item-banking of questions can provide a useful research tool10, 11 and may be applicable where large item sets are available. There may, however, be disadvantages to item-banking, the same questions, or same number of questions, are not completed by patients, it does not enable fixed scoring systems, is less suitable for specialised specific latent traits, and prevents returning to earlier questions to amend responses. The approach used here was to select the smallest number of items compatible with good psychometric performance, an approach which ensures that the best-performing items are used on each occasion. The small number of fixed items maintains flexibility by allowing for either pen and paper completion or electronic entry of responses by patients themselves. To be of practical value in high volume cataract surgical settings it is critically important for questionnaire instruments to be brief. Psychometrically, a trade-off exists between questionnaire length and performance, including responsiveness to surgical intervention, making questionnaire design and item selection paramount. In this paper, the development of Cat-PROM5, a very brief five-item cataract patient reported outcome measure is described, illustrating performance similar to current ‘best of class’ longer instruments.

Materials and methods

Study design

The setting of the study was across 4 cataract surgical centres (Bristol, Torbay, Cheltenham, Brighton) in the English National Health Service (NHS). Questions were harvested from 2 existing United Kingdom (UK) developed questionnaires, the Visual Symptoms and Quality of life questionnaire (VSQ)12 originally developed for a randomized trial of second eye cataract surgery13 and the Vision Core Module 1 (VCM1)14 originally developed as a generic Vision Related Quality of Life (VR-QoL) questionnaire. The original items were separately generated through 40 (VSQ) and 38 (VCM1) in-depth interviews with patients, with subsequent operationalisation involving a further 58 patients (VCM1). Building on this earlier work, the full set of VSQ items were reviewed and those deemed too complex and/or of low applicability excluded. Ten VSQ items were retained and re-operationalised together with 10 VCM1 items and an additional general vision question, giving an initial set of 21 items for evaluation. These items included three theoretical constructs related to self-reported issues with vision: (1) visual functioning (also known as visual disability, or activity limitations); (2) visual symptoms; (3) emotional impacts of vision.

Rather than attempting to impose an a priori theoretical subscale classification onto questionnaire items, a data (patient) led iterative multistage design was employed to simply eliminate subscales. This included three separate data collection cycles as outlined in Figure 1. For the initial pilot or ‘Cycle 0’, baseline pre-operative questionnaire completions were analysed by Rasch followed by Factor Analyses to exclude disordered and misfitting items and assess dimensionality. Item reduction continued until a unidimensional item set had been achieved. At this stage, based on the pilot data, the retained unidimensional items ‘moved together’ indicating that a single construct or ‘aspect of vision’ was being measured collectively by these items. At the next stage, ‘Cycle 1’, both pre-operative and post-operative questionnaire completions were analysed. Psychometric performance of retained items was checked to confirm performance, including unidimensionality, and their responsiveness to surgical intervention was estimated. Having eliminated items belonging to constructs other than the central focus of the item set, the retained items were deemed to measure a single construct, which we describe as visual difficulty related to cataract. Further item reduction using a comprehensive assessment process resulted in selection of a definitive five-item set. In the final confirmation stage, ‘Cycle 2’, performance of the selected definitive five-item set was re-evaluated using a further sample of pre- and post-operative questionnaire completions. As part of ‘Cycle 2’ a 1 in 5 random subsample of participants made a second pre-operative questionnaire completion at least 2 weeks following the first to provide for a test–re-test analysis. Finally, data for the definitive five-item set from all Cycles were aggregated for a combined analysis, which included calibration of the questionnaire items.

Figure 1
figure 1

Flow chart for Cat-PROM5 development study. People awaiting cataract surgery without other visually significant comorbidities in either eye were recruited following informed consent.

People with age related cataract who were awaiting first or second eye cataract surgery at participating centres were potentially eligible for recruitment. Inclusion criteria were: age 50 years or older, ability to understand and complete development versions of Cat-PROM and Catquest-9SF in English, willingness to participate and exclusion criteria were: visually significant ocular or systemic comorbidity, for example, advanced age-related or diabetic maculopathy, significant amblyopia (VA worse than 6/12=0.3LogMAR), gross visual field loss (any cause) or any other visually significant ocular or systemic comorbidity that in the opinion of the local principal investigator rendered the patient unsuitable for the study. These criteria were used to recruit typical NHS patients approaching cataract surgery and to avoid possible confusion between vision issues due to cataract and non-cataract comorbidities. As a precaution, and as reported elsewhere,15 a separate qualitative study was undertaken with people who had both cataract and other visually significant comorbidities to check that these did not cause serious difficulties with the use of the questionnaire in individuals with both cataract and other causes of vision loss. Data were transcribed to a purpose built study database at study sites, with regular source data verification to assure data quality. The study was conducted in compliance with all applicable regulatory requirements (ethics ref:13/NW/0616).

Rasch modelling

Although Rasch proposed his model as a solution for measurement problems specific to educational testing the ideas underlying this model have been adopted as a tool for construction and validation of whole-person concepts such as attitudes, symptoms, perceptions, and (dis)abilities.16, 17, 18 The method provides an estimation mechanism for conversion of ordinal questionnaire data into an interval measure which conforms to the axioms of fundamental measurement, more familiar in the physical sciences.16, 19 This measure takes the form of the Rasch continuum in units of logits, positioning both respondents and items (and their categories) onto the same underlying latent scale, in this case that of self-reported issues with vision due to cataract.

The process of Rasch scaling amounts to a series of iterative procedures testing whether fundamental assumptions of the model hold for a particular set of items or questions, with sequential exclusions. When generating the Rasch parameters, to avoid violation of the underlying assumptions of the model we used only a single completion per person, these being randomly selected as either pre- or post-operative completions, but never both. Since the question structures and rating categories varied, analysis using the Rasch partial Credit Model (PCM)20 was appropriate and this was complimented by supplementary Exploratory and Confirmatory Factor Analyses using polychoric correlations (EFA and CFA). The combination of Rasch and Factor Analyses provide a comprehensive mechanism for assessing dimensionality, that is, checking that all the questions relate to the same underlying construct, in this case visual difficulty related to cataract. Item invariance was checked through differential item functioning (DIF) by analysis of patient data split using 8 sets of criteria, with attention paid to both the statistical significance and magnitude of observed contrasts. The purpose of DIF analysis is to test whether individual questions are used in the same way or differently by individuals belonging to identifiable subgroups, for example, male vs female or younger vs older. The list of analytical parameters deployed in the development process, along with acceptance / rejection criteria are summarised in Table 1.

Table 1 Psychometric properties of the scale and criteria for acceptability

To assess the scale’s responsiveness to surgical intervention we considered the pre- to post-operative mean differences in Logits and Cohen’s d, the latter calculated by two methods (to facilitate comparisons with other studies), firstly using the theoretically more sound pre-operative baseline SD and secondly using the traditional pooled pre- and post-operative SD.

Results

Study participants

Across all three cycles of the study there were 822 participants with analysable data on 1266 completed questionnaires. Demographic and other information on study participants is given in Table 2.

Table 2 Sociodemographic characteristics of participants

‘Cycle 0’ or pilot cycle

From the initial item set of 21 questions, items were excluded iteratively, at each successive step the most problematic item being removed prior to Rasch PCM reanalysis. Following exclusions (Figure 1), 12 items remained for which the fundamental Rasch assumptions held. Principal Component Analysis PCA on residual variance gave a borderline dimensionality result, and this along with a high residual correlation between two items suggested the possibility of two sub-dimensions. CFA confirmed a need to exclude a further item, following which all analysis parameters were satisfactory. Eleven unidimensional items were taken forward to the next analysis cycle (see online Supplementary Table S1 for item descriptions and Supplementary Table S2 for Rasch parameters).

Cycle 1

Patients in Cycle 1 completed the reduced Cat-PROM questionnaire pre- and again a few weeks post-operatively. The results from Cycle 1 in general confirmed that the set of 11 items were appropriately selected with no reversed thresholds and acceptable Rasch parameters (Supplementary Table S2). DIF analysis returned only minor drifts from the specified limits. The mean self-reported visual difficulty on the Rasch scale changed between pre- and post-operatively by −2.16 logits, from −0.66 to −2.88. The standardized effect size (Cohen’s d)21 was −1.62 SD (pre-op SD), and −1.02 SD (pre- and post-op pooled SD). The 11 items were confirmed as a well-performing unidimensional scale measuring visual difficulty related to cataract.

Since the objective was to develop a short and responsive questionnaire suitable for high-volume cataract surgical services, the relative performance of individual items and subsets of items was considered. Preliminary probing indicated that when the item set was reduced below five items the performance, based on Rasch parameters, dropped unacceptably, identifying a five-item set as the preferred size. On a range of considerations VSQ_Overall and VCM1_Interfere stood out as the best two candidates and it was decided that they should be included in a final item set. To search out the best subset of 5 items every possible combination of 5 items from the pool of 11 was generated, with the constraint that each subset should include VSQ_Overall and VCM1_Interfere. The 84 possible subsets were separately Rasch analysed. Through a comprehensive selection process that included assessment of Rasch performance parameters, responsiveness to surgery, patient preferences and expert opinion, the remaining three items were chosen with VCM1_Interfere, VSQ_Overall, VSQ_Reading, VSQ_Doing, VSQ_Bad_Eye being the optimum choice for the final five-item set.

Cycle 2

As the final stage of the Cat-PROM5 scale validation process, Cycle 2 was designed to check the performance of the definitive five items chosen. Rasch indices for the fresh data were similar to those from Cycle 1 and generally satisfactory (Table 3, reversed category averages of the two extreme categories of VSQ_Overall were explained by the fact that there were only 3 endorsements of the final category in this sample. There were no serious DIF problems).

Table 3 Psychometric performance of the Cat-PROM5 items for the pre- and post-operative ‘Cycle 2’ and for all Cycles combined. (Items ordered from low to high visual difficulty from above down)

On average pre- to post-operative scores changed by −3.16 logits, corresponding with a standardized effect size (Cohen’s d) of −1.52 SD and −1.11 SD by the two methods. Test–re-test reliability on a 1 in 5 random sample of 53 pre-operative patients indicated acceptable quadratic weighted Kappa for items (0.66–0.73), and an excellent intra-class correlation coefficient for the person measures (logits) of 0.89 (Table 3).

Final calibration

In order to enhance the precision of the calibration exercise the responses to the definitive set of five items was aggregated from all study cycles. The psychometric performance for the combined data was in line with Rasch model expectations. Figure 2 shows the distribution of item category thresholds against the distribution of patient’s measures, illustrating that Cat-PROM5 is well targeted, with no serious ceiling or floor effects. DIF analysis did not indicate major problems with invariance of item difficulties across eight separate patient groupings as illustrated in Supplementary Figure S1.

Figure 2
figure 2

Cat-PROM5 Person-Item map for all cycles showing respondent distributions for pre-operative (upper panel), post-operative (middle panel) completions, and the Item Locations (Loc) and Thresholds (probability crossover points between adjacent categories, lower panel) on the same Logit scale. In total, 1266 questionnaire completions were available. Pre- and post-operative means −0.41 and −3.61 respectively.

The ‘Overall Vision’ item shift was ‘slight to moderate’ and in the same direction for the pre-vs. post-operative split (DIF=0.62) and the 1st vs 2nd eye surgery split (DIF=0.52) each signifying a relative over-statement of visual difficulty in the presence of cataract and/or an under-statement following surgery. The third shift relating to the pre- vs post-operative split for the ‘Doing’ item just crossed into the ‘significant’ range. This went in the opposite direction (DIF=−0.65) implying an under-statement of the impact of visual difficulty on activities pre-operatively, which would be consistent with adaptation. Rasch Model indices for the combined data are in Table 3, all being satisfactory and confirming a well-functioning unidimensional Cat-PROM5 scale. Pearson correlation coefficients between Cat-PROM5 person measures and pre-operative LogMAR visual acuities were all highly statistically significant (P<0.001) and weakly correlated: better eye 0.21; worse eye 0.19; both eyes averaged 0.24; surgery eye 0.21; fellow eye 0.14. Pearson correlation between Cat-PROM5 and Catquest-9SF22 person measures was R=0.85 (P<0.001; N=1,189 completions).

The pre- and post-operative Cat-PROM5 means were −0.41 and -3.61 respectively with a difference of −3.20 logits and standardized effect size (Cohen’s d) of −1.45 SD and −1.09 SD by the 2 methods, confirming that Cat-PROM5 is highly responsive to surgical intervention. Those pre-operative patients who had cataract affecting both eyes had a mean of +0.01 logits indicating good targeting for bilateral cataract. Small or greater, medium or greater and large or very large (0.2 SD=0.44, 0.5 SD=1.10, 0.8 SD=1.76 logits) self-reported Cat-PROM5 improvements in visual difficulty were reported by 83, 72 and 68% of respondents respectively. Provided all 5 questions have been responded to, raw scores from Cat-PROM5 completions may be converted to logits using the online look-up table in Supplementary Table S3.

Discussion

A rigorous development approach to Cat-PROM5 based on Rasch and factor analysis parameters obtained from typical UK patients aged 50 years and older undergoing cataract surgery in 4 centres in England has resulted in a questionnaire with a final set of five items with robust psychometric performance. The questions are broad which allows patients to map the issues of most relevance to them to these questions, avoiding the problem of highly specific questions with limited applicability for some individuals. The set of five questions vary in presentation format, respondents thus need to consider each question individually which guards against running through the questions checking the same level for each without adequate thought to the items individually. The questions have been thoroughly piloted, have high face validity as presented, display good individual performance indices, and the contribution of each item to the scale is highly satisfactory.

This study recruited typical patients undergoing cataract surgery who were free of other visually significant comorbidities, the intension being to avoid possible confusion of responses relating to non-cataract visual difficulties. As reported elsewhere,15 a qualitative exercise was undertaken separately with patients with both cataract and non-cataract causes of vision loss. This did not reveal serious issues with use of the questionnaire in the presence of comorbidities. Subsequent to completion of Cat-PROM5 development the questionnaire has been used in a separate group of 974 cataract patients which include the ‘usual’ spectrum of comorbidities. The performance of the questionnaire is similar in this group with a mean preoperative score of −0.32 logits and small or more, medium or more and large or very large improvements reported by 80%, 70%, and 62% respectively. The slightly lower proportion reporting large or very large improvements likely reflects the presence of non-cataract comorbidities.

During development, following elimination of poorly functioning, misfitting or clustering items, a unidimensional construct of visual difficulty related to cataract based on 11 items which ‘move together’ was established. The approach to the final item reduction used in this study included a systematic assessment of all possible alternative permutations of items following the decision to retain 2 key general questions, that is, ‘VCM1_Interfere’ and ‘VSQ_Overall’ in a final five-item set. The independent confirmatory Cycle 2 sample and the aggregated ‘all cycle’ analyses affirmed the psychometric performance of the final Cat-PROM5 item set. From the aggregated data, it is clear that the instrument conforms to the fundamental requirements of measurement as demonstrated by close fit with the theoretical requirements of the Rasch model (Table 3). Item invariance was satisfactory, only 3 (7.5%) fell outside of the 5% random chance limit, with DIF magnitudes borderline23 and 2 in opposite directions, so tending to cancel each other out. Correlations with visual acuity were weak confirming that Cat-PROM5 measures a latent trait, which goes beyond traditional visual acuity. Correlation with the Catquest-9SF self-report instrument however was strong (R=0.85), a direct comparison between the two instruments is published separately.15 Test–re-test repeatability was excellent (ICC=0.89) with high responsiveness to surgical intervention for cataract and a standardised effect size, Cohen’s d, of −1.45 SD (baseline SD method).

Cat-PROM5 (online S4) is offered as a well performing self-report instrument suitable for use in high volume surgical services for age related cataracts. The ‘look-up table’ provided in Supplementary Table S3 will allow users to calibrate responses for their own patients and convert raw score totals from the five questions into a single measure of visual difficulty in units of logits. A fixed scoring system allows direct comparisons within and between countries though may not fully translate to other cultures and languages where a Rasch based re-calibration exercise may be required.

In conclusion, the approach used to develop Cat-PROM5 has delivered a psychometrically robust, validated, well targeted and highly responsive five-item questionnaire which can be considered as an appropriate and fit-for-purpose tool of sufficient brevity for realistic implementation in high-volume cataract surgical services.