Introduction

Training in the operating theatre is often unstructured, and occurs by chance encounters dependent on patient and disease variability. A particular facet of surgical practice is the need to train inexperienced individuals to a level of competence in their chosen field. Although training is supervised, and in accordance with the informed consent of the patient, this probably may no longer be an ethically or economically viable option for modern medical practice. It is thus necessary to explore, define, and implement modes of surgical skills training that do not expose the patient to preventable errors.1

There are many tools currently available for training and assessment in phacoemulsification surgery (PS) outside the theatre.2 Laboratory practice allows surgeons to acquire skills in a controlled environment, free of the pressures of operating on real patients according to Piaget’s and Vygotsky’s pedagogical philosophy of ‘learning by doing’. Wet labs use cadaveric human or animal models, or synthetic eyes (designed specifically for performing phacoemulsification) to rehearse the steps of cataract extraction. However, these methods have been criticised for being unrealistic3 with inaccurate simulation of tissue consistency and anatomy4 and also lacking any form of objective assessment. Simulation in the form of virtual reality (VR) and synthetic models have been proposed for technical skills training at the early part of the learning curve in other fields of surgery.5, 6, 7, 8 VR simulators are now starting to be introduced as an adjunct to microsurgical skills courses. It is however considered as preferable for training to be structured within a standardized curriculum.9 This should constitute knowledge-based learning, a stepwise technical skills pathway, on-going feedback and progression towards proficiency goals, enabling transfer to the real environment.10

The aim of this study was to develop an evidence-based and stepwise VR training curriculum for acquisition of technical skills for PS. Although simulators have been evaluated as a part-task training platform for differentiating and developing basic ophthalmic microsurgical skills,11, 12 this is the first time that a phacoemulsification simulator has been subjected to a structured scientific method for curriculum development.

Materials and methods

The study recruited subjects, divided into novice (performed fewer than 10 PSs), intermediate (50–200 PSs), and experienced (>500 PSs) operators. Recruitment was solely through personal communication. The only exclusion criterion was previous training experience with a phacoemulsification simulator.

The EYESI surgical simulator (VR magic, Mannheim, Germany) phacoemulsification (PS) interface was used for this study and both abstract skills (such as forceps training; Figure 1a) and procedural tasks (eg, capsulorhexis; Figure 1b) were assessed. The full PS procedures vary in terms of difficulty and the nine selected tasks reflect this. A detailed description of the selected simulator tasks is provided in Table 1.

Figure 1
figure 1

Screen shots of (a) forceps training abstract task and (b) capsulorhexis procedural task.

Table 1 Description of the selected modules on the EYESI VR phacoemulsification simulator

Each of the four abstract skill and five procedural tasks were performed for two sessions by all novice, intermediate, and experienced subjects. All sessions were completed at least 1 h apart. Before commencing each task, every subject was provided with a full demonstration by an experienced operator and a one-on-one simulator familiarization session during which no assistance was provided.

Data for each of the performed tasks were measured objectively by the VR simulator inbuilt scoring software and comprised 14–31 metrics depending on the task. The data were transferred to the Microsoft Excel spread sheet (Microsoft Corporation, Redmond, WA, USA).

Performance evaluation

Construct validity is a test of whether a model can differentiate between different levels of experience, and thus be used to assess performance.10 Comparison of median performance among the three groups of surgeons was used to assess whether each simulated task was construct valid and substantiated the use of the defined settings of the simulator to assess phacoemulsification technical skill.

The definition of benchmark criteria to be achieved before progression to the next stage of the curriculum was by calculation of the median score for each parameter during the second session for all experienced surgeons.

The way novices advanced through these clearly defined steps through comparative measurement of simulator derived metrics, that is construct validation, and benchmark definition, enabled the assembly of a curriculum for abstract and procedural training, based on the data rather than supposition. This provided an evidence- and proficiency-based pathway for novice surgeons to follow.

Statistical analysis

The choice of 10 subjects per group was based on a two-tailed test, with α=0.05 and power (1–β)=0.80, and an intended reduction of 30% in time taken to complete tasks for experienced vs novice operators, based on the data from previous studies of VR simulation.13, 14 This yielded a value of eight subjects per group, which was increased to ten to allow for dropout and technical malfunction of the simulator.

The data were analysed with SPSS version 18.0 (SPSS, Chicago, IL, USA) using non-parametric tests. Comparison of performance between experienced, intermediate, and inexperienced groups was undertaken using the Kruskal–Wallis test (where P<0.050 was considered as statistically significant) and the Mann–Whitney U test (P<0.017 using the Bonferroni adjustment), as appropriate.

Results

Thirty subjects, comprising ten novices (n), ten intermediate (i), and ten experienced (e) operators, were recruited. All subjects completed two sessions on the four abstract skills, and two sessions on the five procedural tasks. Only the statistically significant metrics selected will be discussed here. There was no statistically significant difference between the first and second repetition of these metrics except where otherwise stated. The second session scores were used for analysis to further reduce the effect of participant familiarization with the simulator during the first session. Construct validity was initially established overall for the total global scores of the nine selected tasks (Figure 2).

Figure 2
figure 2

Total no. of global scores. Horizontal lines within boxes, boxes, and whiskers represent median, interquartile range, and range, respectively. Circle represents an outlier (P≥0.05, Kruskal–Wallis test).

Abstract tasks

The metrics within the easier abstract modules demonstrated a ‘ceiling effect’ with construct validity established between (n) and (i) and between (n) and experienced (e) groups, but not between (i) and (e) groups.

Statistical significance was achieved primarily on global score—Anti-tremor 1 revealed a significant difference only in the first repetition and is excluded. Forceps 1 was significantly different between (n) and (i) (46, 87, and 95; P<0.001 between (n) and (i)). Increasing difficulty of task showed a significantly reduced performance in global score in (n) but minimal difference between (i) and (e)—Anti-tremor 4 (0, 51, and 59) and Forceps 4 (11, 73, and 94) both P<0.001 between (n) and (i). Anti-tremor 1 and 4 showed similar results for average tremor value (47.1, 34.4, 34.3, and 45.6, 35.9, and 35.3; P<0.017 between (n) and (i)).

Incision stress value in both tasks at both levels of difficulty also exhibited statistically significant differences between (n) and (i) but not between (i) and (e)—Anti-tremor 1 and 4 (0.23, 0, and 0; P<0.017) and (3.4, 0, and 0; P<0.017) and Forceps 1 and 4 (3.05, 0.03, and 0; P<0.017) and (7.43, 0.12, and 0; P<0.017). Likewise, time taken in seconds exhibited significant differences between (n) and (i) only for the more difficult tasks Anti-tremor 4 and Forceps 4 (76.5, 54, and 52.5; P≤0.017) and (115.5, 71, and 68; P≤0.017) but not the easier Anti-tremor 1 and Forceps 1. This metric again demonstrated a ‘ceiling effect’ with the experienced group.

Procedural tasks

Procedural modules were found to be construct valid between groups (n) and (i) and between groups (i) and (e). This was the case for global score metrics in Lens cracking (0, 22, and 51; P<0.017) and Phaco of quadrants (16, 53, and 87; P<0.017). In capsulorhexis 1, the global scores demonstrated a similar trend (0, 19, and 63; P<0.017). As the difficulty of the task increased (capsulorhexis 3 and 5), the global score performance in the (n) and (i) group decreased but improved in the (e) group (0, 55, and 73; P<0.017) and (0, 48, and 76; P<0.017).

In addition, in the capsulorhexis module, the more difficult the task performed, the more the number of significant metrics observed. Capsulorhexis 1 revealed a significant difference only for global score whereas capsulorhexis 5 had the most construct valid metrics (Table 2). However these, like the abstract tasks, exhibited statistically significant differences only between (n) and (i) but not between (n) and (e). These included Radial deviation value (0.18, 0.06, and 0.03; P<0.017), Maximum radial extension value (3.33, 1.52, and 0.31; P<0.017), and Lens damage value (15.95, 3.175, and 3.185; P<0.017).

Table 2 Construct valid metrics for task-capsulorhexis, level of difficulty-5, repetition-2

Curriculum construction

The statistically significant metrics only were used in the development of the training curriculum. The summarized outcome is a proficiency-based VR curriculum for training in PS (Figure 3).

Figure 3
figure 3

Evidence-based virtual reality training curriculum for PS (v)=value; (s)=seconds.

Discussion

This study applied a stepwise process to the modules and metrics of a VR simulator, resulting in the development of a training curriculum for PS. The modules were deemed to be construct valid through comparison of performance across three levels of surgical experience. Interestingly, there was no difference between the performance of intermediate and experienced groups on all the abstract tasks and on some of the procedural tasks on the simulator. This is an entirely appropriate finding, as those in the intermediate group approach the plateau phase of their learning curve for PS. Inexperienced subjects are thus most likely to benefit from this training curriculum.

Training within the curriculum commences at the abstract skills modules, with two repetitions of all four skills. Progression to the procedural tasks necessitates achievement of the benchmark proficiency criteria, which are based on the scores derived from the performance of experienced phacoemulsification surgeons. The structure of the curriculum is identical for the five procedural tasks, which again have proficiency criteria for the trainee to achieve before completion of the training period. It is also important to note that the curriculum adheres to the concept of ‘distributed’ rather than ‘massed’ training schedules, with a maximum of two sessions performed per day, each at least 1 h apart.15, 16 Finally, to confirm acquisition of skill rather than attainment of a good score by chance, all benchmark levels must be achieved at two consecutive sessions.

Other studies have investigated the construct validity on the EYESi Phacoemulsification VR simulator. Mahr and Hodge17 as well as Le et al18 analysed performance of the anti-tremor and forceps modules while Privett et al19 focused on the easy and medium levels of the capsulorhexis module. However a fundamental for use of simulation in clinical training schedules, the organization of such data into a coherent, stepwise, and proficiency-based training curriculum has not yet been pursued.

A common complaint is the expense in terms of simulator cost and upkeep, training space and faculty time required for integration of VR curricula into residency programmes.20 With reductions in the learning curve during real operations, it is possible that the total cost of training each surgical resident will be reduced. In terms of training schedules, this curriculum prescribes two sessions per day, at least 1 h apart. The evidence for distributed training schedules is clear, although it is uncertain whether this means practice once a day or once a week.11, 12 Flexibility in accommodating training sessions will be needed when implementing this curriculum, but this should not detract from acquisition of skill as curriculum completion is based on the achievement of proficiency measures.

This training programme is not intended as a substitute for skills acquisition in the operating theatre, but it will allow part of the learning curve to be transferred to the skills laboratory.3 A cognitive skills module is also essential at the front end of any training programme, such as that available from the Royal College of Ophthalmologists microsurgical skills course. Furthermore, completion of this curriculum is based on the dexterity, rather than safety scores or clinical outcome measurements. It is important to use technical skills rating scales and to integrate such scales into the simulator software.2

It is crucial to disseminate this curriculum to other users of VR simulation, to enable external validation of the curriculum in terms of ease of use and feasibility, and to define learning curves to determine the minimum amount of repetitions necessary to establish proficiency. One could then ultimately confirm whether its use does actually lead to the notion of the pretrained novice who can operate with greater dexterity and skill on patients undergoing phacoemulsification surgical procedures. It is then only a matter of time until other domains of ophthalmic surgical practice have to follow this lead of simulation-based training, with objective measurement of performance before operative intervention.