Introduction

Ophthalmologists rely on scientific evidence from randomized controlled trials (RCTs) to inform clinical decisions. When designed and executed optimally, large RCTs balance both the known and unknown factors that may affect the outcome of interest (e.g. visual acuity, intraocular pressure) resulting, theoretically, in an observed effect solely driven by the intervention/exposure (e.g. drug or surgery). However, an understanding of the fundamental elements of the RCT is essential if clinicians are to accurately interpret the results of RCTs; not all RCTs are designed, conducted, and reported with the same methodological rigour [1, 2].

We will focus on four key elements to assess when interpreting RCTs—risk of bias, statistical power, treatment effect, and applicability.

Assessing the risk of bias

Bias in trials is systematic deviation from the truth, which can significantly affect the observed treatment effect in an RCT [3, 4]. Clinicians interpreting RCTs should therefore place a high level of scrutiny on the potential sources of bias in a trial to weight their confidence in the results appropriately. The Cochrane Collaboration’s tool for assessing risk of bias [5] provides a modern tool for assessing methodological quality of an RCT. We extensively describe the types of bias a clinician should be aware of when interpreting an RCT in our previous editorial on risk of bias measurement [6].

Assessing statistical power

The sample size calculation determines how many participants are required for a specific trial—considering recruitment, randomization, administration of interventions, follow-up for outcomes, and analysis—to detect a minimum important difference (MID) in the primary outcome of interest. Thus, the sample size calculation is a critical component of the RCT design affecting the capability of the study to correctly answer the primary study question. A clinician needs be aware of how many participants were required, and whether this target was met during the recruitment phase as well as at the subsequent follow-up time points. An illustration of an adequately powered sample is the Protocol W trial investigating the efficacy of intravitreous aflibercept injections for nonproliferative diabetic retinopathy [7], which reported an enrolment target of 386 eyes to provide 89% power to detect an MID of 15% in the primary outcome of centre-involved diabetic macular oedema development with vision loss or proliferative diabetic retinopathy between groups at 2 years follow-up: this clearly defines the rationale behind sample calculation and provides the necessary details for the assessment of the statistical power. It is important to note that RCTs that were adequately powered at the outset of the study may lose the required statistical power (ability to detect the MID) if there is high patient dropout, crossover, or loss to follow-up. Underpowered studies are difficult to interpret, especially if the result is not statistically significant. Conversely, overpowered studies can yield results that may not be clinically important (e.g. a large RCT on diabetic macular oedema may detect a 20 µm of difference in OCT central macular thickness for a new drug, but such a difference, while statistically different, may not be clinically important).

Assessing the treatment effect

The heart of any RCT is the “results” section and the observed treatment effect. When interpreting the treatment effect, a clinician ought to consider two key aspects: the precision, and the importance of relevance of the estimate. The precision of the treatment-effect estimate is identified best through an examination of the confidence interval (typically 95%). The 95% confidence interval represents the range of values which the clinician can be 95% confident the true treatment effect lies within. A precise estimate is observable in the form of tight confidence intervals; large confidence intervals should be interpreted with caution, as the true treatment effect could be anywhere within the wide range of values, making conclusions on treatment effect unclear. For example, the PANORAMA trial [8] demonstrated a significantly higher proportion of eyes with an improvement in the primary outcome of diabetic retinopathy severity scale (DRSS) (2 levels or more) in the combined aflibercept groups compared to the control group at 24 weeks, with a difference of 52.3% [95% CI 45.2%, 59.5%, p < 0.001]. Given the high precision of the effect estimate, we can be confident that intravitreal aflibercept injections are effective in improving the severity of diabetic retinopathy by 45.2–59.5%. Moreover, the clinical importance of the observed treatment effect is of great interest, as clinical importance translates into meaningful improvement for the patient. A statistically significant result does not necessarily translate into a clinically meaningful difference between groups, and conversely, it is possible that clinically important differences between groups are observed but not statistically significant, particularly when the sample size is small and estimate is imprecise. Clinicians should apply a combination of clinical acumen, experience, and established MID values in the literature to determine whether the observed treatment effect is clinically important.

Assessing the applicability

Applicability of study findings is critical for observed treatment benefits to translate into clinical benefits for patients in the real-world settings. When assessing the applicability of the study results, there are several factors to consider. Firstly, the study population used in the trial ought to be comparable to the clinician’s patient population. If inclusion/exclusion criteria used in the trial are too strict, it is possible that the observed treatment effect may not translate to the clinician’s general patient population. For example, a RCT on a new minimally invasive glaucoma surgery (MIGS) on patients who are treatment naive may not be applicable to patients who are on two or three glaucoma drops in the clinic. Secondly, the feasibility of the intervention delivery and expertise of the health-care provider are important factors affecting applicability. Particularly in surgical interventions, the level of expertise can have significant effects on treatment outcomes; [9, 10] a clinician must be able to adequately deliver the treatment to achieve an optimal treatment effect. Additionally, the intervention must also be reasonable with regards to compliance demands on the patient in the “real-world” setting. A treatment may be effective in a controlled environment, but if it creates unreasonable demands on the patient or health-care provider, its effectiveness may be different in a real-world setting. A classic example were the initial RCTs on intravitreal anti-VEGF therapy for age-related macular degeneration which required fixed monthly therapy over 24 months [11]. Results of the RCTs could not be replicated in real-world settings as adherence to monthly intravitreal anti-VEGF therapy was not practical for many patients. Thus, applicability is a key consideration for the clinician when deciding whether to adjust his clinical practice based of the results of an RCT.

Conclusion

The ability to interpret RCTs through an understanding of the risk of bias, statistical power, treatment effect, and applicability of an RCT is critical for clinicians to make sound decisions in the field.