Introduction

Who defines whether a clinical intervention is successful? It is increasingly recognised that the clinical outcome measures that health professionals value so highly (such as a visual acuity or a global index of visual field loss in one eye or central macular thickness) fail to capture the full impact of ophthalmic disease on a patient’s life. Although ophthalmologists strive to be patient-centred in their approach, dependence on clinical parameters alone is inadequate and may not correlate with the patient’s experience of their disease.1, 2 Patient-reported outcomes (PROs) and the measures that capture them help address this important deficit in the clinician’s knowledge both in the clinical and research settings.

Patient-reported outcome measures: what are they?

A PRO describes any report or measure of the patient’s health that comes directly from the patient without interpretation by a clinician or a researcher. The measures that capture this information are known as patient reported outcome measures (PROMs).3, 4 Commonly, PROMs take the form of either paper- or electronic-based patient questionnaires, which measure treatment benefit or risk, and should be validated for use in the target population.4

PROMs may measure outcome in absolute terms, for example, symptom severity, or may measure change, for instance the extent to which a symptom has improved/worsened. PROMs may be used to assess health-related quality of life (a multidimensional concept that usually includes self-report of the way in which physical, emotional, and social well-being are affected by a disease or its treatment), or to assess symptoms or perception of health status.

A helpful approach to thinking about PROMs is to consider their component parts. The specific questions contained within a PROM are known as items, each of which elicit an answer from two or more options. Items that are related may be grouped as domains or subscales. Although there are a number of ways of classifying PROs, we recommend categories that reflect the World Health Organisation International Classification of Functioning, Disability and Health (WHO ICF).5 Accordingly, we group items according to whether they reflect disease impact on (1) bodily symptoms/functions, (2) activities, or (3) social participation. Although not the focus of this review, it should be noted that some measures explore aspects of patient satisfaction (such as with the clinical care they are receiving); the recently described glaucoma POEM is an example of such a hybrid tool that contains items relating to fear of blindness, acceptability and side effects of treatment, and impact on daily life in addition to satisfaction/experience items.1

Why do PROMS matter?

PROMs are an increasing part of the clinical effectiveness research6 and may be used as the primary outcome (such as in the Effectiveness in Angle Closure of Lens Extraction (EAGLE) study).7 More commonly PROMs are used as secondary outcomes, usually complementing a more familiar ‘clinical’ primary outcome such as visual acuity. Recent examples include the inhibition of VEGF in age-related choroidal neovascularisation trial,8 which used the general health utility measure EQ-5D, and the Full-thickness Macular Hole and Internal Limiting Membrane Peeling Study (FILMS),9 which used the vision-specific measure, the National Eye Institute Visual Function Questionnaire (VFQ-25). Further examples are provided in Table 1. In addition to providing evidence of clinical effectiveness, PROs are increasingly used to inform licensing and reimbursement decisions.10 A study in 2004 reviewed the effectiveness end points reported for clinical trials resulting in Food and Drugs Administration (FDA) approval of new drugs (specifically ‘new molecular entities (NMEs)’) from 1997 to 2002.11 Of 215 new product labels, PROs were reported in 30% and were the only end point cited in 11%. A more recent study of drugs approved between January 2000 and June 2012 (specifically NMEs and biologic licence applications) looked at FDA approval of ‘PRO labelling’ (ie, where the label of the drug can include claims of treatment benefit with regard to a PRO). Of 308 drugs approved during this period, 70 were specifically approved for ‘PRO labelling’, and for 57 of these, this was based on studies in which the PRO was the primary outcome.12

Table 1 Selected recent ophthalmic trials and their usage of PROMs

There are some areas of medicine where the measurement of a PRO is self-evidently essential and therefore should be a primary trial outcome (eg, pain in a trial of analgesic efficacy). Arguably, however, the assessment of PROs should be a required part of all comparative effectiveness trials. Indeed, PROMs are of particular value in helping inform health-care decisions where there is a significant trade-off to be made between therapeutic benefit and undesirable side effects, or to differentiate between two interventions with apparently similar benefits in terms of the primary outcome (eg, equal reduction in anatomical closure of a macular hole), but which might vary in terms of patient-reported ability to read or drive. Given that in these situations the relative cost of treatments may also need to be considered, it is of no surprise that many studies include a particular type of PROM known as a utility measure, which can be used to provide an estimate of incremental effectiveness in terms of quality-adjusted life years for use in cost-effectiveness analyses.13 Incorporating an economic evaluation within a trial provides valuable information for clinical decision-making and policy in particular when the clinical effectiveness of the interventions is similar.

For example, in the FILMs study, comparing surgery with or without peeling of the internal-limiting membrane (ILM), the findings suggest that surgery with ILM peeling is likely to be a cost-effective option for the treatment of macular holes.14

In the United Kingdom, PROMs are also becoming integrated into routine clinical practice.3 Since April 2009, NHS England has used PROMs to audit PROs following four common operations: inguinal hernia, varicose vein surgery, knee replacement, and hip replacement. For example, from 1 April 2011 to 31 March 2012, there were 247 688 PROM-eligible procedures carried out in hospitals for which 184 829 (74.6%) preoperative questionnaires were obtained and 136 899 (55%) have preoperative and postoperative questionnaires.15 This systematic and continuous analysis of routine surgical practice provides a wealth of important information, in addition to the standard measures of length of stay and postoperative mortality.3 The internal comparator of a preoperative questionnaire improves the ability to measure the real impact of the intervention rather than simply reflecting demographic differences between regions, ethnic distributions, or case mix. Although the focus of this article is the use of PROMs in ophthalmic research, it is important to realise that PROMs are likely to become a part of routine clinical practice in ophthalmology, and familiarity with them is thus of increasing value to clinical staff, as much as researchers. It should also be recognised that the challenges discussed in this article regarding using PROMs in research are magnified when using these tools in the uncontrolled environment of clinical practice.

The place of PROMs in ophthalmic research

The development of most PROMs in ophthalmic research has been driven by the recognition that clinical tests, such as visual acuity, perimetry, and ocular coherence tomography, imperfectly capture the extent to which patients are impacted by sight impairment. Although PROs in ophthalmology do not need to be solely about vision (eg, severity of pain may be a significant outcome in dry eye disease as recognised by the Ocular Surface Disease Index),16 most PROMs were developed with a view of assessing visual function and its consequent impact on activities of daily living and social participation. The first measures were introduced in the early 1980s and include the Visual Function Index developed by Bernth-Peterson17 and the ‘Functional Problems of the Visually Impaired’ tool commissioned by the National Eye Institute (NEI) and developed by Bikson and Bikson of the Rand corporation.18, 19 It is beyond the scope of this review to discuss all the vision-related quality of life instruments that have since been developed, but it is worth highlighting the development of the National Eye Institute Visual Function Questionnaire (NEI VFQ), which was first published in 1998 and which is, in its shorter form (NEI VFQ-25), the most widely used vision-related PROM in the world (see Box 1).

Selection of PROMs for use in randomised controlled trials

The most appropriate PROM for a study will depend on the study objectives and the target population. It is important for the investigator to first identify what domain or domains they want to measure, which will depend on the underlying hypothesis. For example, this might be difficult with near-vision activities in a study of presbyopia, difficult with peripheral vision in glaucoma or ocular pain in a study of scleritis, or keratoconjunctivitis sicca. It is recommended that there is patient engagement from the outset, including patient input into the selection of domains to ensure that the study does indeed capture outcomes that matter to the patient. Having selected the domains of interest, an appropriate PROM should be chosen based on its (1) content, (2) measurement properties, and (3) practical application encompassing acceptability, feasibility, and interpretability.

PROM content

In assessing the suitability of a PROM with regard to content, available measures can be analysed or ‘mapped’ in terms of their items and domain(s) to assess the extent to which they are likely to gather the outcome of interest. At this point, it should be considered whether the study is best served by a generic instrument, a vision-specific instrument, or a condition-specific instrument (if available). Generic instruments, such as the SF-36, assess broad aspects of the quality of life and health status. Vision-specific instruments, such as the NEI VFQ-25, are focused on aspects surrounding visual and emotional function, and so are more sensitive to issues experienced by ophthalmic patients. Condition-specific instruments are targeted to the needs of patients with a single condition. Examples include the VF-14 for patients with cataract or the Glau QOL 36 for patients with glaucoma.20, 21 Although some fields have no condition-specific instruments, others, such as glaucoma, have seen a rapid growth in available instruments. In 2011, Hamzah et al5 conducted a systematic review of available instruments for use in a glaucoma population, identifying 16 vision-specific, 16 glaucoma-specific, and one combined tool. One of the key issues that this study illustrates is the variability of PROMs in this field, both in terms of their scope (which is not always accurately reflected by their title) and their quality. Specifically, it was noted that the glaucoma-specific/combined PROMs comprised five measures focused on the disease (‘glaucoma status measures’) and twelve measures related to glaucoma medication.5 Understanding the construct of the measure is the first step in choosing the appropriate PROM for the study.

PROM measurement properties

Selection of a PROM must also take into account its measurement properties. This may include its reliability, validity, responsiveness, precision, interpretability, acceptability, and feasibility, although it should be noted that these properties do not depend solely on the PROM but will vary according to the population in which it is used.22

Reliability is scored in terms of test-retest reliability (reproducibility in stable patients over time) and inter-rater reliability (agreement between two trained interviewers assessing the same patient at the same time; not applicable for self-administered interviews). In addition, internal consistency (correlation between items within a domain) is often calculated for PROMs, which contain multiple subscales that each measure a distinct construct. High levels of internal consistency indicate that items grouped within such subscales are indeed evaluating aspects of the same concept.23 Recommendations of the European Regulatory Issues on Quality of Life Assessment Group advise minimum standards for these parameters: specifically these are intraclass correlation of 0.7 for test-retest reliability, 0.8 for inter-rater reliability, and a Cronbach alpha of 0.7 for internal consistency (all scored out of 1.0).10

Validity is in theory, the extent to which the instrument measures what it is intended to, and in practice, the extent to which the instrument correlates with other instruments and/or clinical outcomes and can detect differences in disease severity between groups of patients. It should be assessed in terms of content validity (the extent to which individual items adequately assess the domains of interest without redundancy), construct validity (the extent to which predetermined logical statistical and clinical relationships hold true for the tool), and criterion validity (the correlation with a reference or ‘gold standard’ instrument).

Responsiveness describes the capacity of the tool to detect significant differences, whether change over time, or differences between cohorts at the same time; it is commonly measured by comparison with a reference tool. Precision reflects the gradations of response, that is, are there enough distinctions within the scale of measurement to capture different states.

PROMs should be validated for use in the target population, such that it is clear that the tool performs adequately, both in terms of its psychometric characteristics and responsiveness, in the intended setting.4 This process may be undertaken as a part of a dedicated cross-sectional study, or within an intervention-based clinical trial or longitudinal study.24 Readers interested in PROM development and validation are advised to refer to guidance presented by Fayers and Machin25 or Johnson et al26 for further information.

PROM acceptability, feasibility, and interpretability

To be useful in either a research or clinical context, measures must also be acceptable, feasible, and interpretable. Acceptability covers features such as the time required to complete the questionnaire, and will reflect the physical and mental capacity of the population being surveyed. Feasibility considers the resources required to use the measures correctly. Acceptability and feasibility may vary according to patient group and context. It may therefore be appropriate to explore these issues with patient/clinical groups or in a pilot study before using a PROM in a larger-scale clinical trial. Interpretability describes the ease with which we can understand differences in score. The smallest difference that matter to patients, the minimally important difference,27 needs to be determined both for score interpretation and also for ensuring that randomised controlled trials (RCTs) are powered to detect important differences if they exist.25

Resources for selecting PROMs

Recognising the importance of PROMs, but also the rapid expansion of measures of variable quality, methodologists, and trialists, have established a number of organisations to provide guidance for PROM development, evaluation, and selection. The Patient-Reported Outcome and Quality of life Instruments Database (PROQOLID; www.proqolid.org28 is an online source of information on PROMs, which provides key facts for each PROM including author, purpose (disease, population, objective), characteristics (type of instrument, eg, health-related quality of life, patient satisfaction, physical functioning, psychological functioning, administration mode, number of questions, recall period), and languages available. The database can be searched, both for a specific instrument (eg, ‘NEI VFQ-25’), or for a particular indication (eg, ‘cataract’). For the evaluation of PROMs, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) group undertook a multidisciplinary Delphi-based initiative to develop a critical appraisal tool that can be used as a checklist (available at www.cosmin.nl29). Similarly, the International Society for Quality of Life Research (ISOQOL) has published recommendations of minimum measurement standards for PROMs.30, 31, 32 Finally, the European Medicines Authority and the US FDA provide guidance regarding the use of PROMs in trials that may be submitted as evidence in applications for regulatory approval.4, 10

Designing RCTs with a PROM

PROMs are an integral part of comparative effectiveness studies. Until recently PROMS were the ‘poor relation’ of clinical outcomes, but their importance is increasingly recognised with greater attention being given to their place in study design, conduct, and reporting.33 Best practice needs to start at the design phase to ensure that the study does indeed meet its objectives, with good compliance, maximal data capture, and interpretable valid results.

Key issues and challenges should be determined at the outset and specified in the study protocol;25 this may be assisted by checklists such as the one by King (University of Sydney, Psycho-Oncology Cooperative Research Group)34 There should be a justification for the use of a PROM within the design, a hypothesis as to the direction and preferably size of any change expected and the evidence on which this estimate is based (including power calculation). It should be specified whether the PRO is a primary or secondary outcome, the statistical analysis that will be used, and which domains this will include, to minimise the problems associated with multiple statistical testing. The timing of PRO measurement needs to be prespecified, both with regard to the stage in the trial (eg, baseline and intervals), but also in terms of when measurement will occur during a visit (commonly before any procedure or treatment). Given that missing data are a particular challenge in PROM research, it is also advisable to prespecify the procedures that will be adopted to maximise compliance and what methods will be used to handle missing data. The mode of administration should also be specified; this is particularly important where patients (such as with sight loss) may not be able to complete forms designed to be self-administered. The use of ‘proxies’ to complete the forms on behalf of the patient should again be prespecified and documented.25

Using PROMs in RCTs

During the study there are a number of important considerations to ensure that PROM assessment is carried out in a standardised and valid way. It is clearly important that the schedule specified in the protocol is adhered to, and in particular, that the baseline assessment is carried out before randomisation, particularly in open trials where knowledge of the intervention may cause anxiety and impact on the PROM.36

A recurring challenge with PROMs centres around compliance with data collection, that is, how many individual questionnaire items, and how many forms, are missing within a trial. There are two problems. First, missing data points lead to a loss of statistical power, potentially resulting in a type II error (ie, failure to detect a significant difference). Second, the attrition may not be random, but rather due to death, depression, or some other factor resulting in a biased assessment.25, 36

In order to improve data capture, the importance of the PROM data to the study and study question should be emphasised.37 PROM data can be collected face to face in the clinical setting or via postal questionnaires coordinated by the study office. The former can be problematic as the clinical setting may influence participant response. The latter may have a greater risk of missing data. Data capture may be improved by having a centrally managed PRO data monitoring system in place for evaluating compliance across study sites, with data collection reminders to participants where needed and chasing-up of missing items.38 Electronic versions (e-PROMs) that may be filled in online with email prompts are an alternative.

Analysing PROMs in RCTs

As PROMs typically comprise many subdomains and are often assessed at a number of time points within a study, there is a danger of a type I error (ie, a false-positive result) from multiple statistical testing. One option is to use a single test of the aggregate PROM score, but this may result in ‘dilution’ of differences within a key domain of interest. Alternatively, domains of interest can be prespecified as end points linked closely to address the study hypotheses. In addition, the time point for the outcome data needs to be prespecified or aggregated as an area under the curve analysis. In order to reduce bias, analysis should be undertaken on an intention-to-treat basis (ie, all patients randomised are included in the final analysis whether or not they actually received treatment). This is important as participants lost to or withdrawing from the study may show significant differences in terms of PROM scores, which might be missed.10, 39, 40

Perhaps the biggest challenges in PROM analysis arise around the handling of missing data. First, the extent of missing data must be reported, including whether this appears to be at random (both between treatment groups, and between different items within the PROM). Second, the extent to which this may undermine the validity of the study needs to be assessed. Third, the missing data may be estimated or ‘imputed’, although it should be recognised that none of the techniques to do this are perfect. Indeed, the FDA recommend using two prespecified imputation techniques and that any significant difference between them, as determined during sensitivity analysis, is regarded as a cause for concern.4 The analytic plan, including details of handling multiplicity and missing data, should be prespecified in the study protocol.41

Reporting PROMs in RCTs

A major problem with the reporting of PROMs in clinical trials is that they are underreported, or only reported years after the main study.4, 10, 22, 27, 38, 42 Although one could argue that this is of little consequence when the PROM outcome moves in the same direction as the clinical outcome, the PRO data may help inform future patients about the likely impact of treatment on outcomes that matter to them. In some instances, PROs may moderate the interpretation of a beneficial clinical outcome. Selective reporting of outcomes (unless prespecified) should be avoided, whether these are PROMs or clinical outcomes.43 The Consolidated Standards of Reporting Trials Patient Reported Outcomes (CONSORT PRO) is a recent extension to the CONSORT guidelines specifically around PROs. They provide five checklist items recommended for RCTs to be reported in all RCTs, in which PROs are a primary or important secondary outcome. These are (1) that the PRO is identified as a primary or secondary outcome in the abstract; (2) that the hypothesis regarding the PRO is provided; (3) that evidence of the instrument’s validity and reliability is provided; (4) that statistical approaches for dealing with missing data are explicitly stated; and (5) that limitations and generalisability with specific regard to the PRO data are discussed.31, 33

Disadvantages of PROMs in RCTs

PRO assessment in trials may be costly, both in terms of the use of licensed instruments, costs for administration, data management, trial analyses, and patient time. Thus, PROMs should only be included when there is a clear rationale and hypothesis.

Missing PRO data can be a particular problem in trials. Retrospective data capture may not be possible25 and often PRO data are not missing at random: commonly, those participants with the poorest outcomes fail to complete items or questionnaires.44 Rates of missing PRO data within a trial may be higher in the following instances:

  • Where trial participants are overly burdened with lengthy (or numerous PROMs).4

  • Where participants are given PROMs containing questions they consider intrusive, or of questionable relevance.45

  • Where data collection staff are unaware of the importance of the PROM to the trial outcome and are not given specific instructions regarding the prevention/management of missing data.

Thus, when designing a trial, PRO data collection should be limited so that the average participant can complete the process in a reasonable time. The PROM should be carefully selected, with input from the target population, to ensure acceptability to the participants in the study.24 Finally, the protocol should contain information on the methods that should be in place to prevent avoidable missing PRO data, for example, education of data collection staff and centralised monitoring of data compliance with back-up data retrieval where possible.46

Trial management groups should also be aware that PROMs occasionally detect concerning levels of participant distress (eg, severe depression or suicidal thoughts) or physical symptoms (eg, pain) that may require an immediate response.47 Accordingly, there should be an a priori plan in place to monitor and manage alerts should they arise, in line with the risk profile of the trial; this may also require additional resources in some instances.

The future of PROMs in ophthalmic research

PROs are a vital part of modern ophthalmology, and are now finding their place in both clinical trials and in routine clinical practice. It should however be recognised that the measures that evaluate these outcomes (PROMs) are complex, and need to be utilised and interpreted with care. Guidance from international consortia is directed towards improving the quality of PROMs, assisting researchers in the selection of suitable instruments and helping them collect, analyse, and publish the results in an accurate and meaningful way.29, 32 The increasing focus on PROs in high-quality ophthalmic trials is an important step, which has the potential to inform care, encourage shared decision-making, and help deliver a patient-centred approach to clinical practice.