Introduction

Anti-vascular endothelial growth factor (VEGF) therapies have revolutionised the treatment of neovascular age-related macular degeneration (nAMD). Visual outcomes following anti-VEGF1, 2, 3, 4 have been unparalleled by previous therapies which included laser photocoagulation5, 6 and photodynamic therapy.7 The cost-effectiveness of anti-VEGF drugs depends, however, on the establishment of an early and accurate diagnosis. Fundus fluorescein angiography (FFA) interpreted by an ophthalmologist is the current reference standard for the diagnosis of nAMD,8, 9 as it directly detects the presence of the neovascular tissue that fills with the dye. However, FFA is an invasive and a time-consuming test. Other alternative diagnostic technologies are available of which the most widely used is optical coherence tomography (OCT). OCT, including time-domain (TD-OCT) and the most recently developed spectral domain (SD-OCT), is a lightwave-based technology that allows the imaging of the retina obtaining ‘sections’ of this neovascular tissue with scan rates and resolution parameters that have greatly improved over the past 10 years. OCT is a non-invasive, non-contact visual test that requires about 5–10 min to assess both eyes. It is user friendly, typically undertaken by trained medical photographers and interpreted by ophthalmologists. However, it might also lead to efficiencies by allowing other categories of health professionals to become involved in the diagnosis of patients.

The aim of this systematic review was to evaluate the diagnostic accuracy, interpretability, and acceptability of OCT, alone or in combination with other tests, for the assessment of newly presenting patients suspected of having nAMD.

Materials and methods

The index test performed was OCT. Both TD-OCT and SD-OCT were considered. Comparator tests were clinical evaluation with slit lamp biomicroscopy (with or without the use of contact lens), visual acuity, Amsler grid, colour fundus photography, infra-red reflectance, red-free images, fundus autofluorescence imaging (FAF), indocyanine green angiography (ICGA), preferential hyperacuity perimetry (PHP), and microperimetry. The reference standard was FFA interpreted by an ophthalmologist. However as few studies reported individual ophthalmologist-interpreted FFA (rather than reading centre-interpreted FFA), studies using FFA as the reference standard but with unclear information about which type of healthcare professionals interpreted the images were also considered. Participants were individuals presenting with symptoms of nAMD. Types of studies considered were direct (head-to-head) comparisons, in which all the participants received the index test and the reference standard, and indirect comparisons (eg, case–control studies) in which estimates of the accuracy of the respective tests were obtained in different study groups. Randomised controlled trials (RCTs) evaluating effectiveness outcomes where treatment was based on OCT compared with FFA findings were also included, as were the studies evaluating the acceptability and/or interpretability of the test.

Published, unpublished, and ongoing studies were identified from literature searches of electronic databases (from 1995 to March 2013) and appropriate websites. There were no language restrictions. Databases searched included MEDLINE, MEDLINE In-Process, EMBASE, Biosis, and Science Citation Index for all reviews (see Supplementary Appendix 1). The Cochrane Central Register of Controlled Trials was searched for additional reports on RCTs reporting effectiveness outcomes and PsycINFO and ASSIA for studies reporting acceptability data. The Cochrane Database of Systematic Reviews, Database of Abstracts of reviews of Effects, MEDION, and HTA database were searched for relevant systematic reviews and HTA reports. Abstracts and presentations from recent conferences (from January 2009 to September 2012) of the American Academy of Ophthalmology, the Association for Research in Vision and Ophthalmology, and the European Association for Vision and Eye Research and also the WHO International Clinical Trials Registry Platform, Clinical Trials.gov, and EU Clinical Trials Register were searched for ongoing studies. Websites of key journals, professional organisations, and manufacturers of equipment were also consulted. Reference lists of all the included studies were also evaluated for possible inclusion and authors were contacted for details of additional potentially relevant reports.

Two reviewers independently screened the titles and abstracts (if available) of all the reports identified by electronic searches. Full-text copies of all potentially relevant papers were obtained and two reviewers independently assessed them for inclusion. Data extraction was undertaken by one reviewer (MMC) and checked by a second (AAB or GM). The risk of bias and applicability concerns of included full-text studies was assessed by two reviewers independently using an adapted version of the updated quality assessment of diagnostic accuracy studies (QUADAS-2) checklist.10 This checklist is designed to be adapted to the specific review topic. Disagreements were resolved by consensus or arbitration by a third reviewer.

The results of the individual studies were tabulated and, when possible, sensitivity, specificity, predictive values, likelihood ratios, and diagnostic odds ratios (DORs) were calculated. Summary receiver operating characteristic (SROC) curves were produced for each test where two or more diagnostic studies provided sufficient data. Meta-analysis models were fitted using the hierarchical summary receiver operating characteristic (HSROC) model11 in SAS version 9.1 (SAS Institute Inc., Cary, NC, USA). A symmetric SROC model was used, which takes proper account of the diseased and non-diseased sample sizes in each study, and allows estimation of random effects for the threshold and accuracy effects. The SROC curves from the HSROC models were produced on the corresponding SROC plots. Summary sensitivity, specificity, positive and negative likelihood ratios, and DORs for each model were reported as point estimate and 95% confidence interval (CI).

Results

A total of 4682 titles and abstracts were identified, of which 179 reports were selected for full-text assessment. Twenty-two studies (24 reports12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35) met the inclusion criteria. One-hundred and fifty-five reports were excluded as they failed to meet one or more of the inclusion criteria in terms of study objective and design, participants, test, and reference standard or outcomes reported (Figure 1).

Figure 1
figure 1

Flow diagram outlining the screening process.

Of the 22 included studies, 7 were retrospective, 9 were prospective, and 1 study24 had both a prospective and retrospective component; the remainder did not provide this information. The studies enrolled 2124 participants, of which a total of 1754 eyes with suspected nAMD were available for analysis. In 10 studies, the participant recruitment was consecutive. A summary of the characteristics of the included diagnostic studies is presented in Table 1.

Table 1 Summary of the characteristics of the included diagnostic studies

Twenty studies were full-text papers and two were available as abstracts.19, 27 Four studies (five reports) were written in non-English languages (Fujii 1996 (Japanese),16 Chen 2003 (Chinese),14 Krebs 2007 (German),22 and Torron 2001 and Torron 2002 (Spanish)).34, 35 Four studies reported the diagnostic performance of more than one test within the same population (Cachulo 2011 reported TD-OCT, ICGA, and FAF;13 Do 2012 reported TD-OCT, Amsler grid and PHP;15 Alster 2005 reported PHP, colour fundus photography and visual acuity;12 Sandhu 2005 reported TD-OCT plus colour fundus photography30). The prevalence of nAMD in these studies ranged from 17.2to 100% (median 80.0%).

All of the studies reported the eye as the unit of analysis (1754 eyes) except for one study where the unit of analysis was the patient (155 patients).27

Regarding the risk of bias (Figure 2), the domains with the greatest number of studies judged to be at high risk of bias were the patient selection domain (55%, 11/20) for reasons such as inappropriate exclusions and pre-selection of participants, and the flow and timing domain (40%, 8/20) for reasons such as the length of time between the index test and reference standard being longer than one week, and not all participants being included in the analysis. None of the studies were judged to have low risk of bias across all the domains. All reports were judged to have low concerns for applicability, in that the participants and setting, index/comparator test and target condition as defined by the reference standard were considered to match the question being addressed by the review.

Figure 2
figure 2

Summary of the risk of bias and applicability domains.

Regarding the diagnostic performance of OCT, Figure 3 shows a forest plot of the sensitivity and specificity of 10 studies. Across these studies, the median (range) sensitivity and specificity values reported were 94.5% (36–100%) and 73.5% (66–94%), respectively. Only four studies (all TD-OCT) reported specificity. For TD-OCT, across the studies, the median (range) sensitivity and specificity values reported were 92.3% (36–100%) and 73.4% (66–94%), whereas the only SD-OCT study reported sensitivity of 100% and did not report specificity. Four studies, all TD-OCT, provided sufficient data for inclusion in a meta-analysis, using HSROC methodology thus results are presented in Figure 4 and pooled estimates for these studies are shown in Table 2. The pooled sensitivity and specificity (95% CI) were 88% (46–98%) and 78% (64–88%), respectively. The meta-analysis findings are broadly supported by the descriptive analyses involving a larger number of studies, shown in Figure 3.

Figure 3
figure 3

Individual study results for all OCT diagnostic studies reporting sensitivity and/or specificity.

Figure 4
figure 4

Individual study results reporting sensitivity and specificity for TD-OCT.

Table 2 Pooled estimates for the OCT diagnostic studies

In descriptive analyses, across the studies reporting other tests, median sensitivity was high for ICGA (93.2%, range 84.6–100%; four studies) and FAF (93.3%; one study), followed by PHP (81.5%, range 50.0–84.8%; three studies), colour fundus photography (70%; one study), and lowest for Amsler grid (41.7%, one study). Specificity was highest for colour fundus photography (95%; one study), followed by PHP (84.6 and 87.7%; two studies), and was low for FAF (37.1%; one study) and ICGA (36.8%; one study). Figure 5 shows a forest plot of the sensitivity and specificity of the PHP and ICGA studies, respectively.

Figure 5
figure 5

Individual study results for sensitivity and specificity.

Six studies provided some information related to the interpretability of the tests, in which they reported numbers excluded from the analysis due to poor image quality.12, 13, 15, 21, 24, 30 For studies reporting OCT, the percentages of images excluded from analysis were 2.7% (35/1307 eyes),21 5.8% (6/104),15 and 7.8% (10/128 individuals)30 although it was unclear whether all of the excluded images related to OCT or whether some might also relate to FFA.

Discussion

We identified a relatively small body of evidence comparing OCT and other tests with a reference standard of FFA for the diagnosis of nAMD. The majority of studies reported TD-OCT (12 studies) and only one study reported SD-OCT therefore meta-analysis was possible only for TD-OCT. The pooled sensitivity for TD-OCT was relatively high at 88% and moderate specificity at 78%. There was insufficient information to compare performance of different diagnostic technologies, or to address the acceptability of the tests, or clinical effectiveness when treatment was based on OCT compared with FFA findings, and little information was available on interpretability of the tests.

Descriptive analyses involving a larger number of studies broadly supported the meta-analyses findings. Of the alternative tests, median sensitivity was similarly high for ICGA (93.2%, 4 studies) and FAF (93.3%, 1 study), followed by PHP (81.5%, 3 studies) and colour fundus photography (70.0%, 1 study) but low for the Amsler Grid (41.7%, 1 study). Specificity was highest for colour fundus photography (95%, 1 study), followed by PHP (84.6 and 87.7%; 2 studies), and was low for FAF (37.1%; 1 study) and ICGA (36.8%; 1 study). Our study is a systematic diagnostic accuracy review following robust methods and not limited to studies reported in English. A HSROC model was used for the analysis, which takes account of the trade-off between true/false positives and models between-study heterogeneity.36 Pooled estimates for TD-OCT were derived but were not possible for SD-OCT due to insufficient data. There was also insufficient information to address the questions of the clinical effectiveness of OCT compared with FFA, or the acceptability of the tests.

FFA interpreted by an ophthalmologist was the reference standard test for our study, assuming perfect sensitivity and specificity. Consequently it was not possible to address the question of whether OCT might actually be a better diagnostic test than FFA and have higher sensitivity or specificity than the current reference standard. One approach that has been suggested for determining when a new test should replace the reference standard is that proposed by Glasziou et al.37 Glasziou et al suggested the use of a third, ‘fair umpire’ test, which although potentially less accurate than either the new test or the reference standard, nevertheless could be considered a fair umpire if its errors were considered to be independent of the other tests.37 However, the authors acknowledged that this would usually be difficult to demonstrate. Unfortunately, none of the included OCT studies involving a third test provided a sufficient level of detail to allow us to explore this approach.

In two studies,21, 28 some participants were classified as having nAMD who were negative on FFA but positive on one of the other tests being assessed (13/541 eyes by TD-OCT in the Kozak 2008 study and 4/20 participants by ICGA in the Reichel 1995 study). For the purposes of our study, these were considered to be test false positives (as the reference standard of FFA was considered to have perfect sensitivity and specificity).

Our searches identified four health-technology assessment reports that included an assessment of OCT in the detection of nAMD.38, 39, 40, 41 The German report by Stürzlinger et al39 concluded that although OCT yielded diagnostic findings in addition to FFA results, OCT could not replace FFA for initial diagnosis. The Belgian Health Care Knowledge Centre report, similar to this review, reported high sensitivity (96–97%) and moderate specificity (66%) for OCT.41 However, the Australian MSAC report concluded that due to the absence of a valid reference standard, the diagnostic accuracy of OCT could not be assessed.40 The report by the Medical Advisory Secretariat, Ontario, Canada also questioned the validity of FFA as a reference standard and presented conclusions that were based on expert consultations.38

In conclusion, we identified a relatively small number of studies, of variable quality, on the performance of OCT in the diagnosis of people newly presenting with a suspicion of nAMD. The available evidence suggests that although TD-OCT is a relatively sensitive test for the initial diagnosis of nAMD, it is of moderate specificity. Consequently, it should not be used as the only test to diagnose nAMD. The current evidence suggests that TD-OCT should not replace the reference standard of FFA in the diagnosis of nAMD. Further research is required for evaluating the diagnostic performance of SD-OCT for nAMD.