Introduction

Historically training in ophthalmology, as in other surgical specialties, has been based on a Halstedian model of apprenticeship learning. Trainees are assumed to be competent upon completing a minimum number of surgical procedures. Changes to the clinical environment and professional values have forced a review of this approach [1]. One of the problems associated with this model is the inconsistency in levels of knowledge and skills gained due to variations in clinical exposure and educational opportunities [2]. Using the total number of procedures that a trainee has performed as a benchmark for skill is also problematic as quantity does not equate to quality and competency cannot be accurately discerned in this way. Reductions in training hours due to regulations such as the European Working Time Directive further limit potential training opportunities [3]. Furthermore, growing ethical concerns over the use of patients for training purposes [4] are also having major impacts on training particularly in the early stages of the learning curve. Studies have shown close correlation between experience and complication rate [5, 6].

These issues highlight the need for improved training programmes with the development and objective assessment of proficiency prior to treating patients. Simulation models offer a platform for trainees to improve their clinical and surgical skills, enabling focussed, competency-based training without putting patients at risk. The healthcare sector is continually making rapid technological advances and the development of simulator models as safe and effective tools for training and assessment has risen dramatically. This trend has been observed within the field of ophthalmology [7], but the extent to which simulation is used varies widely between different training programmes. Its role remains limited by a lack of formal, standardised integration into existing curricula.

The purpose of this systematic review is to comprehensively evaluate the effectiveness and validity of all simulator models developed for ophthalmic training to date.

Methods

This review was carried out following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) statement and registered on the international prospective register of systematic reviews, “PROSPERO”, prior to conduction of this study (registration number: CRD42018087929).

Eligibility criteria

All original studies were included if they described simulation or e-learning for technical or non-technical skills development in ophthalmic training. Inclusion criteria for study participants were ophthalmologists of any grade and medical students who had completed or were completing their ophthalmology attachment. Studies were excluded if they did not provide original data; articles not specific to ophthalmology; and studies that did not use simulation for training or assessment purposes. We included all papers irrespective of the language.

Search methods

A systematic search of PubMed and Embase was carried out, using the terms “(simulat* OR virtual reality OR wet lab OR cadaver OR model OR e-learning) AND ophthalm* AND (training OR programme OR course)”. Search date was from inception to 01/07/2019. Reference lists from included articles and relevant reviews were hand searched for eligible studies.

Study selection

Two authors, RL and WYL, carried out independent, duplicate searches. All abstracts were reviewed and articles that were potentially eligible were read in full. A final list of studies meeting the eligibility criteria was compared and disagreements resolved by discussion (Fig. 1).

Fig. 1: PRISMA Flow Diagram.
figure 1

Flow diagram of study selection process.

Data collection

The same two authors extracted data for each study separately and differences were resolved through discussion. Data collected included details of the simulator model, type of study design, number of participants and their training level, training task(s) involved, duration of training, and outcome data addressing validity and effectiveness of the model.

Data analysis

Studies were grouped according to simulator type: virtual reality; wet lab (live or cadaveric animal models and human cadaveric models); dry lab (synthetic models); and e-learning models. Validity was evaluated based on Messick’s modern validity framework [8] and the strength of each source of validity evidence was measured using a validated rating scale [9]. Effectiveness was quantified using an adaptation of McGaghie’s proposed levels of simulation-based translational outcomes (Table 1) [10]. Qualitative analysis was carried out due to the heterogeneity of study designs.

Table 1 Details of the frameworks used for evaluation of validity and educational impact.

Results

A total of 3989 articles were screened, of which 3751 were excluded following abstract review. After reading the remaining 238 articles in full, a further 107 were excluded. A total of 131 original articles were included in this systematic review (Fig. 1). Details of findings are summarised in Tables 25 according to simulator type.

Table 2 Virtual reality studies.
Table 3 Wet-lab studies.
Table 4 Dry-lab studies.
Table 5 E-learning studies.

Virtual reality

Eyesi Surgical

The Eyesi Surgical (VRmagic, Mannheim, Germany) is a high-fidelity virtual reality simulator designed for practising intraocular procedures. It consists of a mannequin head that houses a model eye connected to a computer interface and an operating microscope. The movements and positions of surgical instruments are tracked by internal sensors, producing a virtual image that is viewed through the microscope, as well as on separate touchscreen. The software contains training modules that simulate different steps in cataract and vitreoretinal surgeries. The system records performance metrics, enabling scores and feedback to be generated [11]. Of all virtual reality simulator models developed for use in ophthalmology training, the Eyesi has been the most extensively assessed, with a total of 33 validity studies.

Cataract surgery

[Summary: content = 2; response processes = 1; internal structure = 2; relations to other variables = 2; consequences = 2; translational outcomes = level 5].

Twenty-eight studies assessed the Eyesi cataract training modules, collectively demonstrating all five sources of validity evidence, with data strongly supporting each parameter (score = 2) except for response processes, which had more limited evidence (score = 1) [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38]. A randomised controlled trial (RCT) by Feudner et al. showed that those who trained with the Eyesi achieved significant improvements in their capsulorhexis performance in the wet lab compared with the no-training, control group [14]. Another RCT suggested that virtual reality training was comparable to training using wet lab [13]. Residents were assessed on their first capsulorhexis in the operating room following either Eyesi or web-lab training. Overall technical scores were equivalent. The study also provided evidence of predictive validity with a direct correlation between time taken to complete the training modules on the Eyesi and true operating room time, as well as overall performance score.

Regarding patient outcomes, five studies demonstrated the transfer effects of Eyesi with reduced complications in live cataract surgery following training [12, 21, 29, 35, 38]. Of note, a multi-centre retrospective study involving 265 ophthalmology trainees across the UK showed that complication rates dropped from 4.2 to 2.6% (38% reduction) following the introduction of Eyesi simulators into training programmes [35]. Similarly, a study by Baxter et al. demonstrated that the use of a structured curriculum with wet lab and Eyesi training led to a considerable reduction in complication rates compared with reported figures for traditional training programmes [38]. However a recent study also testing transfer of skills showed some limitations to Eyesi training [36]. Performance during Eyesi training was comparing to subsequent performance in theatre. Results showed that improvements in OR performance was only observed for ophthalmologists who were less experienced and that the ability for Eyesi scores to discriminate between novice and experienced surgeons could only be seen in the first few training sessions.

Vitreoretinal surgery

[Summary: content = 1; response processes = 1; internal structure = N; relations to other variables = 1; consequences = N; translational outcomes: level 2].

Only four studies have evaluated the vitreoretinal modules on the Eyesi Surgical Simulator [39,40,41,42]. These studies support the content validity for vitreoretinal surgery training, as well as response processes, and relations to other variables. Similar to cataract surgery training, scores on the vitreoretinal modules were able to discriminate between experienced and inexperienced surgeons. One study reported evidence for response processes through the standardisation of testing and assessment such as allocating set time periods for training, standardised instructions and using the same supervisor. This evidence remains limited at the best [43]. Studies on the vitreoretinal modules also demonstrated a learning curve with overall scores increasing and completion time decreasing with repeated attempts, indicating contained effects in using the Eyesi for vitreoretinal training. No evidence has been published to support internal structure and consequences or transfer of skills to the operating room.

MicroVisTouch

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = 1; consequences = N; translational outcomes = N].

The MicroVisTouch (ImmersiveTouch, Inc, Chicago, USA) is another commercially available virtual reality simulator that was introduced after the Eyesi, with a report of the prototype published in 2012 [44]. Unlike the Eyesi, the MicroVisTouch features a single handpiece that is attached to a robotic arm and is used to control the appropriate instrument according to the procedure being simulated. It also differs from the Eyesi in that it has an integrated tactile feedback interface, reportedly the first ophthalmic simulator to have this feature [45]. Currently, simulation is limited to three key steps in cataract surgery (clear corneal incision, capsulorhexis and phacoemulsification), although further modules are being developed.

Compared with the Eyesi, fewer studies have assessed the MicroVisTouch. Two groups have reported, implicitly, that the simulator demonstrates content validity for simulating capsulorhexis and that there is evidence of relations to other variables [44, 45], but other sources of validity evidence are lacking. Evidence supporting the effectiveness of using the simulator is also lacking. A third group adapted the MicroVisTouch by customising the algorithm and integrating OCT (Optical Coherence Tomography) scans of varying vitreoretinal conditions to the simulator, enabling patient-specific simulation training of vitreoretinal procedures (epiretinal membrane and internal limiting membrane peeling) [46]. However, the validity and effectiveness of this model was not tested in the original study and no further reports have been found.

Eyesi Ophthalmoscopes

Direct

[Summary: content = 1; response processes = 2; internal structure = 2; relations to other variables = 2; consequences = 1; translational outcomes: level 2].

The Eyesi Direct Ophthalmoscope (VRmagic, Mannheim, Germany) is a virtual reality simulator that enables fundoscopy examination practice, consisting of an ophthalmoscope handpiece with built-in display and a patient model head connected to a touchscreen. A range of patient cases and pathologies can be selected from the programme and objective feedback is provided based on the trainee’s performance [47].

Although only two studies were found evaluating this simulator, there was strong evidence for its validity. Borgersen et al. published the only study in this review to assess validity using all five parameters in Messick’s framework, and showed that the consequences of using a set pass/fail score to accurately discriminate between inexperienced participants (medical students), who were given a fail compared with the experienced participants (ophthalmology consultants) who all passed [48]. The second study showed that participants who trained with the simulator achieved higher scores in an  OSCE (Objective Structured Clinical Examination)   assessment compared with a control group who only received classical training, thus demonstrating contained effects for translational outcomes [49].

Indirect

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = 1; consequences = N; translational outcomes: level 1].

The Eyesi Indirect Ophthalmoscope (VRmagic, Mannheim, Germany) is similar to the Eyesi Direct, an ophthalmoscope headband that is connected to a display showing a 3D virtual patient and virtual lenses when physical, diagnostic lenses are placed over the model head. As with the Eyesi Direct, physiologic and pathologic functions for the virtual patient can be controlled and varied.

Only two studies were found for this simulator [50, 51]. In contrast to the Eyesi Direct, validity evidence was limited to relations to other variables as one study showed that the simulator could discriminate between medical students and ophthalmology trainees [50]. Effectiveness was limited to internal acceptability as participants gave positive feedback of their experience in using the simulator.

Others

A variety of different virtual reality simulators have also been described, including three models for cataract surgery [52,53,54]; five for vitreoretinal surgery [55,56,57,58,59]; one for endoscopic endonasal surgery [60]; two for general ophthalmic surgery [61, 62]; 1 for ophthalmic anaesthesia [63]; 1 on ocular ultrasound [64]; and 1 for indirect ophthalmoscopy [54]. However, these have all been stand-alone reports with limited evidence of content validity only (scores of 0 or 1). An exception is the Endoscopic Endonasal Surgery Simulator by Weiss et al., which was tested in an RCT and demonstrated good internal structure [60]. Effectiveness was only tested in four models, with the Sophocle retinal photocoagulation simulator shown to be the most effective (downstream effects) as live assessment on real patients showed that the simulator group performed similarly to the control group who had previously practised on patients [58]. As with the other descriptive study models, these simulators have not been further investigated.

Wet lab

A total of 47 studies on wet-lab models were found, of which 12 were mixed models used in conjunction with an inanimate device or artificial system. From the animal model studies, 22 used porcine-related specimens, 3 used sheep specimens, 4 used goat eyes and 3 rabbit eyes. The number of studies using human cadaveric eyes or isolated lens were 17, of which 3 were used in combination with animal tissue.

Cataract surgery

[Summary: content = 2; response processes = 0; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = level 2].

There were 16 studies describing the use of wet-lab models for cataract surgery [65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80]. These demonstrated content validity only, with no evidence for other validity parameters. Models which showed the strongest evidence for validity were pig eyes filled with cooked chestnuts for practising phacoemulsification [71] and rabbit eyes fixed with paraformaldehyde for simulating capsulorhexis [74]. These two models demonstrated contained effects and internal acceptability respectively.

Vitreoretinal surgery

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = N].

One study described the use of rabbit eyes for performing pars plana vitrectomy, from which content validity could be inferred [81]. However, all other sources of validity evidence and indications of effectiveness were lacking.

Glaucoma surgery

[Summary: content = 1; response processes = 2; internal structure = N; relations to other variables = 1; consequences = N; translational outcomes = level 2].

Six studies were found for glaucoma surgery [82,83,84,85,86,87], with the majority lacking formal validity assessment. One study, which tested placement of human cadaveric eyes into a model head Marty the Surgical Simulator (Iatrotech Inc., Del Mar, USA) for goniotomy simulation, demonstrated good response processes and evidence of internal acceptability [84]. Dang et al. also showed that performing trabeculectomies on porcine eyes with added canalograms for outflow quantification had some evidence for relations to other variables and contained effects [82].

Corneal surgery

[Summary: content = 1; response processes = N; internal structure = N; relations to other variables = 2; consequences = N; translational outcomes = 1].

The use of wet-lab models for practising corneal surgery has been described in four studies [88,89,90]. Content validity and relations to other variables were demonstrated in one study [91], which tested the feasibility of simulating Descemet’s membrane endothelial keratoplasty on human corneas with an artificial anterior chamber with a 3D-printed iris. However, evidence of other validity parameters and effects were not demonstrated in the other studies.

Strabismus surgery

[Summary: content = 0; response processes = 0; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = level 2].

Two wet-lab models were found for strabismus surgery, both using porcine eyes. White et al. added bacon to the eyes to simulate extraocular muscles [92], whereas Vagge et al. asked residents to practice on a chicken breast model followed by the pig eyes [93]. Discussion of content validity and response processes was made in both studies but no data were reported. Internal acceptability and contained effects were demonstrated for the two models respectively.

Oculoplastic surgery

[Summary: content = 2; response processes = 0; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = level 2].

Four studies described the use of wet-lab oculoplastic simulators [94,95,96,97]. These all demonstrated content validity, with one study by Pfaff showing strongest evidence for this parameter [95]. One group showed that using a split pig head for practising lid procedures had good internal acceptability [94] and another group using human cadaver eyes showed that trainees had improved comfort, confidence and technical skills in performing canthotomy and cantholysis procedures [97].

Orbital surgery

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = N].

Altunrende et al. describe using a sheep cranium to practise ocular dissection for orbital surgery. Content validity was reported but any further effectiveness of the model was not testes [98].

Ocular trauma

[Summary: content = 1; response processes = N; internal structure = N; relations to other variables = 0; consequences = N; translational outcomes = level 1].

A recent study by Mednick et al. showed that placing iron particles on human cadaver eyes for corneal rust ring removal simulation had evidence of content validity and relations to other variables [99]. Internal acceptability was shown to be high. Another study on ocular trauma surgery described the use of goats’ eyes for practising corneoscleral perforation repair [100]. However, as the study was purely descriptive, it was not possible to assess its validity or effectiveness.

Diagnostic examination

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = N].

One Study by Uhlig and Gerding tested the use of porcine eyes placed inside an adjustable, artificial orbit for practising direct and indirect fundoscopy, as well as gonioscopy [101]. As this was a descriptive study, no evidence for validity or effectiveness was given.

Others

The remaining wet-lab models were either used to simulate a wide range of anterior and/or posterior segment surgeries or general micro-surgical skills [102,103,104,105,106,107,108,109,110,111].

Only two models, both using porcine eyes for micro-surgical skills assessment, provided data supporting their validity. Ezra et al. investigated the use of a video-based, modified Objective Structured Assessment of Technical Skill (OSATS) assessment tool. They demonstrated good internal structure, with high inter-rater reliability, and relations to other variables, with significant correlation between the OSATS scores and results from a separate motion-tracking device [109].

The Eye Surgical Skills Assessment Test (ESSAT), involving the use of porcine eyes and feet as part of a three-station assessment, demonstrated all five sources of validity evidence. One study showed, via a panel of ophthalmic surgery experts, that there was strong evidence of content validity [110]. A further masked study demonstrated that the ESSAT showed strong inter-rater reliability (internal structure) and that the senior resident in the study scored higher than the junior resident (relations to other variables) [111]. Unlike other models, the study authors also went on to discuss the potential consequences of using the ESSAT as an assessment tool, weighing up the benefits of setting a competence score that trainees would need to meet before performing on real patients, with the potential problems of the ESSAT becoming a stressful test preventing less confident residents from entering the operating room. The effectiveness of using this test, however, was not tested.

Altogether, the wet-lab studies, which assessed effectiveness only evaluated responses to participant surveys (internal acceptability) [105] and performance improvements on the models themselves (contained effects) [108, 109]; downstream effects were not demonstrated.

Dry lab

Twenty-six studies on synthetic models were identified, of which eight were developed for practising diagnostic examination techniques (slit lamp, direct and indirect ophthalmoscopy), six for vitreoretinal surgery, one for strabismus surgery, four for laser procedures, two for orbital surgery, one for cataract surgery, one for oculoplastic surgery, one for ocular trauma, one for general ophthalmic surgery and one for combined fundoscopy examination and laser procedures.

Cataract surgery

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = level 2].

Abellán et al. developed a low-cost cataract surgery simulator using a methacrylate support and aluminium foil for capsulorhexis simulation [112]. This was the only inanimate simulator to be tested in an RCT and demonstrated transfer effects as those who trained using the model achieved a higher percentage of satisfactory capsulorhexis in subsequent practice with animal eye models compared with those who had begun training with the animal eyes. Further validity evidence was lacking from the study.

Vitreoretinal surgery

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = 1; consequences = N; translational outcomes = level 2].

For vitreoretinal surgery, two different validated models were found. Hirata et al. used quail eggs within a silicone cap to simulate membrane peeling. The model was shown to discriminate between experienced and inexperienced surgeons in terms of operating time and the success rate of membrane peeling (relations to other variables) [113]. This was similarly tested in a study by Yeh et al. where, using the artificial VitRet Eye Model (Phillips Studio, Bristol, UK) filled with vitreous-like fluid to simulate a variety of vitreoretinal surgery procedures, a positive correlation was observed between the trainees’ level of experience and total score [114]. In terms of effectiveness, the models by Yeh and Hirata showed internal acceptability and contained effects, respectively.

Other dry-lab models for vitreoretinal surgery included the use of an artificial orbit with diascleral illumination [115]; a modified rubber eye [116]; a medium-fidelity model constructed using a wooden frame and tennis ball to simulate the globe [117]; and an artificial eye with inner limited membrane made using hydrogel [118]. However, these models were only described, with no assessment of validity or effectiveness.

Strabismus

[Summary: content = 1; response processes = 2; internal structure = 2; relations to other variables = N; consequences = N; translational outcomes = level 2].

One study was found on the use of a low-fidelity, dry-lab model for strabismus surgery simulation [119]. The model consisted of a rubber ball simulating the globe; elastic band simulating the recti muscles, a piece of latex to simulate the conjunctiva and cornea. Results showed no significant differences between this model and a higher-fidelity, wet-lab model and this dry-lab model. The study showed strong evidence of valid response processes and internal structure as the authors performed a pre-randomisation test to determine baseline dexterity and ensured stratified randomisation of participants into two different groups with equal baseline dexterity. The process for evaluating the participants’ skills after training was also robust as their performance was evaluated by two independent ophthalmologists using three different validated assessment scales (ICO-OSCAR, OSATS and ASS).

Laser surgery

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = 1; consequences = N; translational outcomes = level 2].

Out of the four laser simulators found [120,121,122,123], only the model designed by Simpson et al. showed evidence of validity through relations to other variables [103]. The effectiveness of training with this model, however, was not investigated. Conversely, a capsulotomy simulator by Moisseiev and Michaeli demonstrated contained effects but did not test for validity [121].

Oculoplastic surgery

[Summary: content = 0; response processes = 0; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = N].

One oculoplastic surgery dry-lab model was found, using an anatomically correct skull model for simulating nasolacrimal duct surgery [124]. However, this was descriptive only, with no assessment of validity or effectiveness.

Orbital surgery

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = N].

There were two studies using 3D-printed orbit models for simulating orbital surgery [125, 126]. However, as these were also descriptive only, evidence for their validity and effectiveness was not shown.

Trauma management

[Summary: content = 1; response processes = N; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = level 1].

The Newport Eye is a simple training phantom using a craft eye, resins and ground black pepper to simulate corneal foreign body removal [127]. This demonstrated evidence for content validity as experts agreed that the model was realistic in terms of its tissue colour, consistency and anatomy. Trainees also reported being more confident with the procedure after using the model, demonstrating internal acceptability. However, this simulator does not appear to have been used by other groups and no further reports were identified.

Diagnostic examination

[Summary: content = 0; response processes = 0; internal structure = N; relations to other variables = 1; consequences = N; translational outcomes = 0].

The largest proportion of dry-lab models were designed for practising examinations, including slit lamp and fundoscopy (direct and indirect) [128,129,130,131,132,133,134,135,136]. Two studies tested the validity and effectiveness of the EYE Exam Simulator (Kyoto Kagaku Co., Japan), a popular tool for fundoscopy practice. This consists of a head model with adjustable pupil sizes and changeable fundus slides to represent different retinal conditions. McCarthy et al. showed that response processes were generally poor as letters were added to the slides to check the participant’s field of vision through the ophthalmoscope and the majority were unable to identify the markers or pathology on the slides [132]. Survey responses also indicated that there was low user satisfaction with the model as trainees did not feel it was realistic or that the exercise improved their skills. On the other hand, Akaishi et al. showed that there was a strong correlation between accuracy of examination on the EYE simulator and previous experience of performing fundoscopy in the clinic, providing some evidence for its validity [128].

All other studies of ophthalmoscopy and slit-lamp simulators were descriptive, showing internal content validity only.

E-learning

Aside from training technical skills, tools have also been developed for improving cognitive and other non-technical skills such as teamwork and leadership. A total of five studies were found using this modality, with the majority testing its use amongst medical students. All studies incorporated both training and assessment as part of the course.

Computer-Assisted Learning Ophthalmology Program

[Summary: content = 2; response processes = 2; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = level 1].

The Computer-Assisted Learning Ophthalmology Program designed by Kaufman and Lee is a multi-media, interactive tutorial, which aims to help medical students learn about the pupillary light reflex [137]. Content validity was demonstrated extensively by experts and response processes were thoroughly assessed as each student’s experience and thought process during the training was evaluated by an external interviewer after the programme. Despite the positive responses from all groups showing that it was a valid and effective simulator (internal acceptability), it has not been used by other medical schools and no further reports have been found.

Case-based e-learning modules with Q&A games

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = N; consequences = 1; translational outcomes = level 2].

A study by Stahl et al. also tested the consequences of using e-learning modules as a part of ophthalmology teaching for a group of 272 medical students [138]. Although validity parameters were not formally tested, the authors found that students who used e-learning more frequently achieved better exam results.

Ophthalmic Operation Vienna

[Summary: content = 0; response processes = N; internal structure = 2; relations to other variables = N; consequences = N; translational outcomes = level 2].

An RCT by the Medical University of Vienna evaluated the use of a 3D animated programme for learning different steps in ophthalmic surgery [139]. This demonstrated strong internal structure as a reliability analysis of the multiple-choice questions used at the end of the programme showed a Cronbach’s α coefficient of 0.7, indicating high reliability. Those in the simulation group also outperformed the control group in the final test, showing contained effects.

3D computer animations for learning neuro-ophthalmology and anatomy

[Summary: content = 0; response processes = N; internal structure = N; relations to other variables = N; consequences = N; translational outcomes = level 2].

A similar study by Glittenberg and Binder was carried out, investigating the use of a combination of various 3D design software for teaching complex topics in neuro-ophthalmology [140]. No evidence was provided supporting its validity. However, effectiveness was demonstrated as students responded very positively to the programme in a satisfaction questionnaire (internal acceptability) and also achieved significantly better results in a post-lecture test compared with the control group (contained effects).

The Virtual Mentor

[Summary: content = 2; response processes = N; internal structure = 0; relations to other variables = 1; consequences = N; translational outcomes = level 2].

Whilst most e-learning studies were designed to help medical students, one model was developed for ophthalmology residents to develop non-technical skills. A multi-centre RCT tested the effects of using The Virtual Mentor, an interactive, computer-based programme teaching the cognitive aspects of performing hydrodissection in cataract surgery, including decision making and error recognition [141]. Test questions demonstrated good content validity as they were developed and modified by cataract surgery experts across nine academic institutions. Test scores also demonstrated relations to experience, with correlation between total marks and residency year of training. Despite the lack of data quantifying the reliability of this model, the study showed a degree of internal structure as residents were randomised using a stratified design according to their academic centre and residency year, factors which would likely have influenced the test scores. Internal acceptability was demonstrated by positive user feedback and contained effects through higher post-test scores and a greater mean increase in pre- to post-test results in the simulator group compared with the control group.

Discussion

This systematic review of simulation training in ophthalmology provides a comprehensive evaluation of all available simulation tools using the modern taxonomy. Virtual reality simulators were the most widely evaluated and the Eyesi Surgical Simulator in particular. For cataract surgery, evidence to support all aspects of content validity have been reported. Critically data support the collateral effects of using the Eyesi with training being shown to result in improve operating room performance and lower complications. In contrast, only a much more limited assessment of other ophthalmic simulation training tools has been undertaken including the vitreoretinal training modules for the Eyesi Surgical system. A wide variety of dry-lab and wet-lab training models were reported. Use of dry-lab models in ophthalmology was more limited compared with other surgical specialities [142] with no evidence to suggest any model was particularly effective. In contrast, a relatively high number of wet-lab models was reported. In general, acceptability was high with positive participant feedback and there was evidence, albeit limited, to support the educational impact of wet-lab training. Cadaveric animal tissue was most commonly used and no significant benefits of human over animal cadaveric models were reported. Only five studies reported the use of e-learning. These results do support its potential for ophthalmology training but further assessment needs to be undertaken before incorporation into the training curriculum. Lastly, there was a paucity of studies addressing non-technical skills training in this area. The impact of human factors on patient safety is well-recognised corresponding to the rapid increase in non-technical skills training in medicine [143]. One study in this review, the Virtual Mentor e-learning programme, included cognitive components of cataract surgery training [141]. A pilot study by Saleh et al. also demonstrated the feasibility of using high-fidelity, immersive simulation for cataract surgery, using scenarios based on previous patient safety incidents and evaluating the cross-validity and reliability of four established assessment tools (OTAS, NOTECHS, ANTS and NOTSS) [144].

Simulation tools are increasingly being used for assessment of technical and non-technical skills, both formative and summative. In this review, only one assessment tool, ESSAT, has been described. Strong validity evidence has been shown for the ESSAT but further research on the development of standards and application of the ESSAT tool has not yet been performed. Effective skills assessment is becoming increasingly important both to support competency-based training, as well as enable objective proficiency assessment. In response to growing calls for greater transparency and accountability, formal ongoing credentialing and certification are being considered to ensure doctors maintain the necessary skills and knowledge throughout their professional careers. Simulators are being used to provide objective skills assessment but especially in such high-stakes assessment, rigorous validation of the assessment tools is required before they can be implemented.

Overall, the majority of studies lacked a formal validation process, with 45% of studies (n = 59) being purely descriptive. Furthermore, most validity assessments used the outdated validity frameworks which greatly limits the value of these results. “Face validity” was commonly reported as validity evidence despite the recognition that such subjective assessment of the perceived realism of a simulator is largely irrelevant to its educational impact [145]. Likewise the concept of construct validity using expert–novice comparisons remains widely used but again offers little useful insight into a simulator’s educational impact. The lack of validation studies appears greater than in other specialties. Similar systematic reviews of otolaryngology and orthopaedic simulation training reported rates of descriptive studies of 23% and 38%, respectively [146, 147]. This resonates with findings from a recent review of simulation-based validation studies across all surgical specialities that reported that only 6.6% used Messick’s validity framework [148]. Evidence for a number of components were particularly deficient. Internal structure was rarely assessed, a fundamental area evaluating the reliability and generalisability of scores. For the wet-lab and dry-lab groups, a large number of authors have attempted to establish validity through feedback from study participants on whether the simulator was a valid representation of the surgical correlate. However, this is flawed since the majority of these participants are inexperienced and input should be made from those who have more expertise in the procedure of interest. Effectiveness and translational outcomes were also not extensively tested. In particularly wet-lab and dry-lab simulation studies predominantly reported evidence from user satisfaction surveys, with few assessing for skill improvement and none investigating the relationship to OR performance or patient-related outcomes. Although several studies have linked Eyesi training with reduced complication rates, the majority of these have been retrospective studies which did not control for important confounders such as participants undertaking other forms of training. A few studies explored the collateral effects of simulation training on a systemic level, such as cost saving or policy changes. Two separate preliminary analyses on cost effectiveness were carried out in 2013, both suggested that the cost to benefit ratio was unfavourable. One study predicted, on the basis of cost modelling, that residency programmes would not be able to recoup the costs of purchasing one Eyesi model within 10 years under the most optimistic scenario [149]. The other study suggested that, realistically, it would take 34 years to make a cost recovery [150]. In contrast, the most recent study by the Royal College of Ophthalmologists in the UK argued that the Eyesi was a cost-effective method if costs of complication were include. Access to an Eyesi simulator led to a 1.5% decrease in complication rates, which were inferred to result in an estimated 280 fewer cases of posterior capsular rupture complications alone per year. This would amount to a saving of roughly £560,000 per year and, using this figure, the authors calculated that the cost of purchasing 20 Eyesi simulators would be regained within 4 years. Due to the contrast in findings between these three studies and the implications for both patient safety and costs for healthcare providers, further attempts should be made to provide an updated reflection of current cost effectiveness of Eyesi simulators. In addition, there should be more studies to test whether the same potential benefits gained from Eyesi training can be achieved with a lower-cost model.

The limitations to this study are that, although a broad search criterion was applied using comprehensive search terms, it is possible that some reports using different terminology may have been missed. As discussed above, a large proportion of studies suffered from poor methodologies, utilising outdated concepts of validation and greatly limiting the conclusions that can be drawn. The heterogeneity in methodology and outcomes across the studies also prevented the use of quantitative analysis. In addition, the majority of e-learning studies included in this review recruited medical students rather than ophthalmic professionals, thus results obtained may not be reflective of specialised training.

Conclusion

The increasing importance of simulation training in ophthalmology is reflected by the number and variety of models described in the literature. The Eyesi Surgical remains the only model to have undergone extensive testing and the necessary evidence supporting its use has been reported. The main limitations of current research lie in the use of outdated validity frameworks, a lack of attempt made to establish the collateral, systemic effects of using simulator models and the low quality of validation study designs. Future studies need to follow current recommendations on the assessment and validation of educational tools to ensure that simulation-based training is successfully incorporated into current systems of training in ophthalmology, especially for high-stakes applications such as credentialing and assessment.