Introduction

Cataract blindness affects twenty million people worldwide, primarily in low- and middle-income countries1. This blindness is reversible with successful cataract surgery. Manual Small Incision Cataract Surgery (MSICS) is a low-cost, low-technology cataract surgery technique that has been found to have outcomes comparable to the more expensive, high-technology technique of phacoemulsification2. However, the shortage of eye surgeons globally has led to a backlog of patients in need of cataract surgery3. Additionally, training new surgeons is challenging due to the increased risk of complications in cataract surgery performed by trainees.

Simulation-based training (SBT) has been shown to improve the operating room performance of surgeons, resulting in better patient outcomes4,5,6,7,8. While previous research has focused on the use of simulators for phacoemulsification cataract surgery, the use of SBT for training in MSICS could also be beneficial. This approach may be an effective way to train more surgeons in MSICS. A non-profit humanitarian organization called HelpMeSee has developed a virtual reality simulator with haptic feedback to train surgeons in the safe performance of MSICS. The aim of HelpMeSee is to increase the number of MSICS surgeons operating globally, increase the number of MSICS surgeries performed, and restore vision to a greater number of individuals worldwide (https://helpmesee.org).

Mastery learning is an approach to SBT whereby trainees are assessed at baseline and during training, with an end goal of attaining a minimum passing standard9. To ensure evidence-based proficiency assessment in mastery learning, it is necessary to demonstrate evidence of validity for the assessment tools used10. According to STANDARDS for Educational and Psychological Testing11, there are five sources of validity evidence. Demonstrating evidence in these five areas supports an argument that a test has validity in a particular context. The sources of validity evidence are based on test content, response processes, internal structure, relations to other variables, and consequences of testing.

Currently, there is no objective, virtual reality simulation assessment of performance in the steps of MSICS surgery with established validity evidence. In order to address this issue, our aim was to develop a test of MSICS competencies using the HelpMeSee simulator and to provide validity evidence for use of the automatic and unbiased outcome metrics.

Methods

The study was conducted as a prospective study. It was carried out at the Copenhagen Academy for Medical Education and Simulation (CAMES), Capital Region of Denmark, and Instituto Mexicano De Oftalmología (IMO), Querétaro, Mexico from October 2020 to April 2021.

This study adhered to the tenets of the Declaration of Helsinki. No human patients were involved in the study. The Ethics Committee of the Capital Region of Denmark ruled that approval was not required for this study (protocol no. 20051000).

The HelpMeSee simulator for MSICS was used for this study. A team of MSICS clinical experts with experience in simulator development included 11 steps in the HelpMeSee MSICS Standard Procedure for testing. Careful consideration was given to selecting clinically relevant assessment parameters for the related outcome metrics. To ensure the best possible validity evidence for test content, we decided to include all 11 steps in our test.

Three groups of participants were defined: (1) ophthalmologists with no cataract surgery experience, (2) experienced phacoemulsification cataract surgeons with no MSICS experience, and (3) surgeons experienced in both phacoemulsification and MSICS cataract surgery. Inclusion criteria for each of the groups were: (1) ophthalmologists employed at an ophthalmology department without any cataract surgery experience, (2) surgeons with > 1000 phacoemulsification procedures and no MSICS experience, and (3) surgeons with > 1000 phacoemulsification procedures and > 500 MSICS operations. All novices were recruited from the Department of Ophthalmology, Rigshospitalet, Denmark and IMO, Mexico. Cataract surgeons were recruited from ophthalmology departments or private specialist clinics in the Zealand region of Denmark and from IMO. All surgeons had to be active surgeons at the time of the study, having operated within the past month. Exclusion criteria for all three groups included previous experience with the HelpMeSee simulator for more than 2 h during the past 3 months. In addition, participants who had more than 2 h of experience on another virtual reality simulator in the past 3 months were excluded.

Study size

In order to assume normal distribution of test scores, we aimed for at least 10 experienced MSICS surgeons and 10 novices in MSICS12.

Data collection

The data collection was standardized to avoid threats to validity by ensuring that the administrator of the test did not influence the process and that every participant had a fair and equal test administration (validity evidence towards response process). During testing, three authors (SC, LW, LO) gave verbal instructions to all participants based on a written document to ensure that the same instructions were given to all participants. Only the technical aspects of their performance were intended to be assessed.

Overview of the standardized data collection procedure for each participant

Each participant was assigned a unique ID number and was provided with information about HelpMeSee and the study. Informed consent was obtained and a questionnaire collecting demographic data and experience was completed. Stereoacuity using the TNO test (Laméris Ootech BV, 19th edition) was measured. Each participant viewed an approximately 12-min video of the HelpMeSee MSICS Standard Procedure (https://vimeo.com/426195663). Warmup lasted for 10 min, using the first two assignments. Practical information was provided, including instructions on how to start the simulation, the objectives for each task, and descriptions of the instruments, the training tools, and the scoring parameters. For the data collection, the entire procedure was required to be completed two times, with the possibility of a short break between each procedure. The completion of both procedures could not exceed 2 h. To reduce the risk of fatigue, if the time exceeded this limit, the participant would need to return later. The participants were instructed to complete each task, and they were allowed to determine when each task was successfully finished or ended due to a complication. During the attempts, the participants were not able to view the simulator metrics.

In order to evaluate performance, 55 different metrics were assessed across all the steps of the procedure. Some of the metrics recorded whether an attempt resulted in an outcome within a designated range, such as the length of the inner tunnel limit during scleral tunnel dissection. Other metrics recorded whether a specific complication occurred, such as uveal prolapse during creation of a scleral groove.

Statistical analysis

The HelpMeSee simulator uses proprietary scoring logic that was converted to a binary output, with a score of 1 indicating a pass and a score of 0 indicating a fail for each metric for each task. Imputation using the grand mean was utilized to fill in 64 missing values out of a total of 1275 in the data set13. The data was then imported into SPSS software version 26.0 (SPSS, Inc., Chicago, IL) for statistical analysis.

Item-Total statistics were used to reduce the number of metrics used in the test due to either low discrimination or nondiscrimination of individual metrics based on Corrected Item-Total Correlation14. All items with negative discrimination were also removed. Inter-metric reliability analysis was performed using intraclass correlation coefficient to calculate Cronbach’s Alpha and associated confidence intervals.

Mean test scores for novice MSICS surgeons (including both the non-surgeons and the phacoemulsification-only cataract surgeons) and experienced MSICS surgeons were compared using Independent Samples T-tests.

The Contrasting Groups’ Method was employed to establish a proficiency level15. The pass/fail score was determined at the intersection between the distributions of test scores obtained from novices and experienced MSICS surgeons.

Results

The MSICS novice group included 15 participants, 10 of whom were ophthalmologists with no experience in phacoemulsification or MISCS and 5 of whom were experienced phacoemulsification cataract surgeons without MSICS experience. The experienced MSICS group consisted of 10 surgeons who were experienced in both phacoemulsification and MSICS cataract surgery.

Table 1 shows the original 55 metrics recorded, and the 30 metrics included in the test.

Table 1 Metrics included and excluded from the evidence-based test to assess MSICS performance.

Intraclass correlation coefficient was calculated with a Cronbach’s Alpha of 0.86 with a 95% confidence interval lower bound of 0.77 and upper bound of 0.93.

Using the 30-item test, the novices had a mean score of 15.5 (SD 3.0) and the experienced had a mean score of 22.7 (SD 4.3), p < 0.001.

Figure 1 shows that the pass/fail standard was established at 20 points (out of 30). This resulted in only 1 out of 15 novices passing the test (6.7% false positives) and 3 out of 10 experienced surgeons failing the test (30% false negatives).

Figure 1
figure 1

The establishment of a credible pass/fail standard using the contrasting groups’ method. The curves are constructed based on the mean scores and standard deviations of novice (blue) and experienced (orange) operators, respectively. The big box line at the intersection marks the pass/fail standard of 20 points and the small, dotted lines show the 95% confidence intervals of the standard.

Discussion

We examined the validity of the original 55 metrics captured by the HelpMeSee simulator for the complete MSICS standard procedure and developed an evidence-based test that was narrowed down to 30 metrics. A passing score on this test was determined to be 20 out of 30. Only 1 of the 15 novice MSICS surgeons (mean score 15.5) obtained a passing score, while 7 out of 10 experienced MSICS surgeons (mean score 22.7) passed the test.

In the development of an evidence-based assessment tool, we used only metrics provided by the simulator. These metrics were based on whether the study participant's performance of a step of the procedure resulted in a particular measurement which was within a specified range or avoided a specific error during that step. Using simulator-provided metrics has several benefits, including ease of access and the ability to make numerical comparisons between test takers. Utilizing metrics directly from the simulator also eliminates bias which may occur when performance is evaluated by human raters16. This is the first study to develop an evidence-based test for the assessment of MSICS performance using the HelpMeSee simulator, but a similar test based solely on simulator metrics was created for phacoemulsification cataract surgery using the EyeSi virtual reality simulator8. That test ended up using 7 out of 13 modules (54%) that demonstrated significant discriminative ability, which is similar to the discriminative ability of the metrics used in our test, which included 30 out of 55 (55%).

While using only simulator metrics has some advantages, there are also some drawbacks. By using only binary test scores, some of the richness of information that can be obtained through the use of global rating scales is lost17. Trained raters can evaluate surgical performance by live viewing or video review allowing for a more comprehensive evaluation, but this approach is time consuming, resource intensive, and prone to bias18,19. A color graphics overlay captures not only length and width of the scleral-corneal dissection, but also variations in the depth and deviations in the shape from recommended guidelines. However, there is currently no automated way to assess performance using such a graphics overlay. A way to improve assessment might be to use artificial intelligence to analyze the graphics overlays provided by the simulator.

It is also important to note that, since our validated test only uses those metrics captured by the HelpMeSee simulator for assessment, it is limited by the metrics that are available and may not adequately represent all aspects of performance. Construct underrepresentation is a major threat to validity20. In addition, the number of metrics captured for each step is fixed by the simulator. For example, the step “Delivering the Nucleus” has only two associated metrics, while the step “Performing Hydrodissection and Nucleus Dislocation” has 11 metrics. The balance of our test is thus impacted by the weight given to the different steps. It is possible that the test could be improved by considering additional metrics or adjusting the weighting of steps. As new versions of the HelpMeSee simulator software are released, updates can include better descriptions of the metrics to remove ambiguity and clarify exactly what is being measured during specified simulator tasks. Also, combining simulator metrics with expert raters might be helpful, especially during final assessment.

Interestingly, none of the 5 experienced phacoemulsification cataract surgeons, who were not experienced MSICS surgeons, passed the test. Despite the similarities in some of the steps in MSICS and phacoemulsification, this finding suggests that experience in performing phacoemulsification does not necessarily translate to proficiency in MSICS. This agrees with previous studies which have shown that microsurgical motor learning is highly task specific21,22,23,24. Thomsen’s study compared the performance of ophthalmology residents who had received training in cataract surgery to those who had not, and found no significant difference in their ability to perform vitreoretinal procedures on the EyeSi simulator. It would be unlikely for a cataract surgeon to begin performing vitreoretinal surgery without first obtaining additional fellowship training, but it would be much more likely that a phacoemulsification surgeon might start performing MSICS without additional formal training. Therefore, we may underestimate the additional knowledge and skills required to learn MSICS because it is not intuitive that ocular surgical skills in one procedure do not necessarily transfer to another ocular procedure.

A strength of our study design was the use of a standardized method for collecting data. This standardized method allowed for collecting data internationally from multiple centers and ensured that all the steps of the MSICS procedure were included in the development of our test to capture the complexity of the procedure.

This study has some limitations. One limitation is the small number of participants, which is common in medical education research12. Another limitation is that the simulator and its associated metrics are designed to assess performance based on the technique used by HelpMeSee for its MSICS Standard Procedure. Many experienced MSICS surgeons may use techniques that differ from the HelpMeSee Standard Procedure, and thus they may have encountered difficulties successfully completing a particular task during data collection that they may not have encountered during live surgery. Lastly, we used the number of cataract surgeries performed and the date when they were last performed as a proxy for competence in MSICS. However, we did not capture the frequency with which the experienced surgeons had specifically performed MSICS, which could potentially impact their level of competence in this procedure.

Several studies have shown the effectiveness of virtual reality simulators in training and assessing phacoemulsification cataract surgery skills7,25. With our establishment of an evidence-based test, it is now possible to provide proficiency-based training using the HelpMeSee simulator for MSICS. While we found no inter-procedural transfer of skills from phacoemulsification to MSICS, phacoemulsification surgeons may still require reduced training time due to their previous knowledge and experience. Using our test, simulation-based training can be customized to match the background of the surgeon. The proficiency of the surgeon can then be assessed before progressing on to supervised live surgery using simulator metrics which have been shown to have validity evidence. In the future, this evidence-based test for MSICS surgery can be used to investigate the effectiveness of various training interventions or to design curricula for MSICS training.

Conclusion

With this study, we have demonstrated the development of an evidence-based test for MSICS surgery using virtual reality simulation. This test can be used for assessment during MSICS training. In addition, it can be used as a test when evaluating different training methods for teaching MSICS.