Validation of a high-fidelity training model for fetoscopic spina bifida surgery

Open fetal surgery for spina bifida (SB) is safe and effective yet invasive. The growing interest in fetoscopic SB repair (fSB-repair) prompts the need for appropriate training. We aimed to develop and validate a high-fidelity training model for fSB-repair. fSB-repair was simulated in the abdominal cavity and on the stomach of adult rabbits. Laparoscopic fetal surgeons served either as novices (n = 2) or experts (n = 3) based on their experience. Technical performance was evaluated using competency Cumulative Sum (CUSUM) analysis and the group splitting method. Main outcome measure for CUSUM competency was a composite binary outcome for surgical success, i.e. watertight repair, operation time ≤ 180 min and Objective-Structured-Assessment-of-Technical-Skills (OSATS) score ≥ 18/25. Construct validity was first confirmed since competency levels of novices and experts during their six first cases using both methods were significantly different. Criterion validity was also established as 33 consecutive procedures were needed for novices to reach competency using learning curve CUSUM, which is a number comparable to that of clinical fSB-repair. Finally, we surveyed expert fetal surgeons worldwide to assess face and content validity. Respondents (26/49; 53%) confirmed it with ≥ 71% of scores for overall realism ≥ 4/7 and usefulness ≥ 3/5. We propose to use our high-fidelity model to determine and shorten the learning curve of laparoscopic fetal surgeons and retain operative skills.

www.nature.com/scientificreports/ cadavers) training models. For fSBA-repair, two low-fidelity inanimate models have been proposed yet without validation 14,15 . However, they may be useful in initial training and help reducing the numbers of animals needed in further training. Two in vivo SBA models have previously been proposed for feasibility studies, i.e. fetal rhesus monkey and fetal lamb. While only the latter was used for fetoscopic surgery 16 , neither were validated 17 or specifically designed for training. Among smaller animal models, mice and rats cannot be used for surgical training purposes, due to their size. High-fidelity rather than low-fidelity surgical models enhance training realism and thus minimize potential harms from the LC in animals as well as in humans following translation to clinical practice 12,13 . Such a model for fSBA-repair requires (1) complex surgical steps like port insertion, dissection and suturing (2) simulated in a realistic environment, i.e. with proper depth perception, live motion and pulsatile blood flow in arteries and veins (3) using a living and breathing animal. Moreover, the use of large animal models should be restricted in accordance with NCR3-guidelines 18,19 . Therefore we report on the development and validation of a high-fidelity training model for fSBA-repair in a living and breathing rabbit and its use to determine the number of cases needed for a laparoscopic fetal surgeon to achieve competency.

Methods
Ethical statement. This experiment was approved by the Animal Ethics Committee of the Group Biomedical Sciences of the KU Leuven (P093-2016). It followed the NC3Rs and the ARRIVE guidelines for animal research 18,19 . Study design. The validation of this animal model followed the consensus guidelines for validation of surgical simulators 20,21 . It was assessed in two phases: construct-criterion and face-content validity 22 . For the former, the study was designed to train three laparoscopic fetal surgeons from a single fetal center, i.e. surgeons experienced in open fetal SBA repair as well as multi-port laparoscopy, yet who had never performed fSBA-repair (LJ, PDC, JD).
We categorized our surgeons into novices and experts in our model for simulated fSBA-repair based on surgical experience. Since we aimed to validate a training model and not the port-access approach, we hypothesized that one of them being a single-port and multiple-port laparoscopic neonatal surgeon was an expert (PDC). The multi-port approach is being clinical used for fetoscopic SBA repair 5,23 . By contrast, the single-port approach currently applied in fetoscopy for twin-to-twin transfusion syndrome (TTTS) or congenital diaphragmatic hernia is the ultimate minimally-invasive technique to achieve 24,25 . The other surgeons (LJ, JD) having overall less experience in multiple-port laparoscopy and no experience in single-port laparoscopy were considered as novices. They were thus trained in the model until competency in single-port fSBA-repair was reached (Supplementary Methods 1 and 2.1, Supplementary Fig. S1, Video 1). When these two surgeons had completed that training, they were then referred to as experts. In the end, these three experts performed multi-port fSBA-repair to confirm their competency (Supplementary Methods 2.2, Supplementary Fig. 1, Video 1). Overall novices and experts performed the same procedure consisting of 10 surgical steps yet used a different port-access approach (single-or multi-port). Description of the model. This live model was developed to mimic the operative steps and conditions present for a clinical multilayered fSBA-repair.
Clinical procedure to mimic. The gestational age at fSBA-repair in humans typically is around 24 weeks of gestation 1,5,26 . At that time the fetal weight is 662 ± 77 g 27 and the abdominal circumference 187 ± 10 mm 27 . The region of interest is lumbar in 95% of cases 1 . The current literature on open and fSBA-repair describes several steps in the procedure, which we summarized into 10 consecutive steps to be simulated ( Fig. 1) 1,5,26,28,29 .
Animal model. Rabbits have previously been used for training in pediatric 30,31 and fetoscopic 32 surgery. We used New-Zealand male adult rabbits (weight, 3-4 kg). They were given water and food at libitum under the standard light-dark cycle until the procedure. They were put under general anesthesia without endotracheal intubation (Supplementary Methods 3). In rabbits, the adequately insufflated abdominal cavity mimics the working space or amniotic cavity, which approximately measures 15 × 10x5cm with a pneumoperitoneum of around 3L with CO 2 at 5 mmHg (Fig. 1A) 32 . Monogastric herbivores, rabbits have a large single-chamber stomach with a circumference of 165 ± 13 mm mimicking the abdominal circumference of a 22-24 weeks human fetus (Fig. 1A) 33 . Overall 10 clinical steps for fSBA-repair are recapitulated by a laparoscopic gastric Nissen fundoplication 34 and the suturing of a patch to the gastric wall (Fig. 1B, Video 1). These procedures require the ability to gently manipulate fragile tissue, perform extensive dissection, hemostasis and suture (Supplementary Methods 3).
Technical performance. Clinical outcomes were total operation time (steps 1 to 9) 8,9 , fetal repair time (steps 4 to 9), CO 2 insufflation volume, Objective Structured Assessment of Technical Skill (OSATS) score and watertightness of the patch repair (step 10, Video 1). The latter was tested post-mortem by fluorescein injection under the patch after completion of the repair (Supplementary Methods 4, Video 1). Operative performance and difficulty were assessed applying an adapted OSATS rating scale on videos of the procedures (Supplementary Methods 4) 35,36 .
We also used a composite binary outcome for surgical success based on clinically relevant outcomes to measure the LC and competency level of fSBA-repair 7  Validation study. Construct validity. To assess construct validity and therefore discriminate performance levels of our simulator, we determined and compared competency level of novices and experts during their first six cases applying two methods. First, the Competency Cumulative Sum (C-CUSUM) test 38 used the aforementioned binary outcome for surgical success and was set with a control limit of h C = 3 (Supplementary Methods 5.1). Subsequently, we applied the group-splitting method 9 by comparing performance using the five aforementioned clinical outcomes.
Criterion validity. Criterion validity compares performance of our innovative model to the ground truth which is the clinical procedure in our case since, to the best of our knowledge, no animal training model has been validated yet 7 . Herein we compared the learning curve (LC) of novices in our model to the LC of novices performing   7 . 22% (11/49) of the surgeons were performing fSBA-repair in their fetal center at that time. All were invited to try our model out in our research center and under our supervision to get a realistic experience. For obvious geographical reasons, some experts only answered our anonymous online questionnaire (non-users) while others also tried it (users). According to current practice, all survey responses from expert users or non-users were included in the analysis 40 . We also performed a subanalysis of data from users currently performing fetoscopic repair in humans. Expert fetal surgeons were categorized according to their specialty (obstetricians and gynecologists, pediatric neurosurgeons and pediatric surgeons) and demographic data were captured by seven questions. We additionally asked six questions on face validity and five on content validity, using a 7-point and 5-point Likert scale respectively. We set validity thresholds at 4/7 (undecided) and 3/5 (neutral) for each scale (Supplementary Methods 5.3) 39 .
Statistical analysis. We used GraphPad Prism version 7.00 for MacOs X (GraphPad, La Jolla, CA, USA) to analyze the data. Binomial and categorical variables were expressed as percentages with their frequency distribution. Continuous variables were tested for normal distribution using the D' Agostino-Pearson (omnibus K2) or Shapiro-Wilk normality tests. Continuous variables normally distributed were presented as mean and standard deviation (SD) and the others were expressed as median and range or interquartile range (IQR).
For face and content validity, categorical and continuous variables based on the response by the three clinical subspecialties involved in the survey were compared with one-way analysis of variance (ANOVA) or Kruskal-Wallis test as appropriate. For construct validity, the Fisher exact test was used to compare binomial variables. Continuous variables were compared with unpaired two-tailed t test or Mann Whitney test as appropriate. A p value < 0.05 was considered significant. For the construct and criterion validity, we performed C-and LC-CUSUM analysis using an algorithm that we developed in MATLAB software (Mathworks, Natick, MA, USA) based on and verified with the model of Biau et al. 7 .

Results
Surgical procedures. The data below were collected from 52 completed single-port simulated fSBA-repairs by two novices (n = 34 and n = 18), and from 18 multi-port simulated fSBA-repairs by three experts (n = 6, n = 6 and n = 6). Figure 2A displays the evolution of the C-CUSUM score of each novice and expert during the first six cases. As C-CUSUM scores of the three experts remained below the competency control limit h C = 3, they were considered competent. In contrast, scores of both novices reached a higher score of 3.07 after the 6th case and were therefore not competent (graphs of Fig. 2A). The two groups were also significantly different for all surgical outcomes measured when applying the group-splitting method (table of Fig. 2A).

Construct validity.
Criterion validity. LC-CUSUM analysis of all the cases performed demonstrated that novice 1 reached competency when his score was 0.89, exceeding the control limit of h LC = 0.85, after 33 out of 34 cases (Fig. 2B, left graph). To reduce the number of animals used, novice 2 performed only 18 procedures and did not reach that threshold. When considering the best-case scenario, it was predicted that this novice reached competency at 31 cases (Fig. 2B, right graph). These numbers involving laparoscopic fetal surgeons are lower than what has been reported as the minimum number required for competency in clinical fSBA-repair performed by nonlaparoscopic fetal surgeons (n ≥ 56) 7 .
Demographics of survey respondents. The response rate from the fetal surgeons for SBA to the online survey was 53% (26/49). 38% (10/26) of the respondents tried the model out to get a realistic experience and 27% (7/26) were performing fSBA-repair in their fetal center (Supplementary Table S2). The demographics of the different subspecialists involved were comparable, except that pediatric neurosurgeons had less experience in laparoscopic surgery ( Table 1). None of the respondents was aware of computer simulators or high-fidelity models for fSBA-repair.
Face validity. The three subspecialties of respondents considered our live model realistic since all survey questions reached scores ≥ 4/7 in ≥ 60% of cases and recommended it for its realism (≥ 84% of scores ≥ 4/7; Table 2). Good to exceptional scores (63 to 100% of scores ≥ 4/7) were reached for questions that required a living and breathing animal. These encompassed the surgical target, surgical steps such as dissection, resection and suturing, and depth perception ( Table 2). Despite some discrepancies, there were no significant differences in ratings between the subspecialists. In addition, the seven fetoscopic experts who used our model confirmed its realism as 95% of survey questions scored ≥ 4/7. They also recommended it for its realism (100% of scores ≥ 4/7; Supplementary www.nature.com/scientificreports/ www.nature.com/scientificreports/ Content validity. There were no significant differences in ratings between the subspecialists. Respondents considered the model useful and would recommend it for training (≥ 71% of scores ≥ 3/5) and improving complex fetoscopic skills, such as instrument handling and suturing (≥ 83% and 100% of score ≥ 3/5 respectively; Table 2). Average (≥ 50%) scores varied among the three subspecialties and were obtained when we asked whether the model exposes to stress similar to that in clinical conditions and is useful to train for fetal positioning (≥ 50% and ≥ 60% of scores ≥ 3/5 respectfully). In contrast high scores (≥ 83% of scores ≥ 3/5) concerned the usefulness of the model for instrument handling, suturing, self-confidence, insufflation, tissue mobilization, skin closure and quality assessment of the patch repair. Finally, the seven fetoscopic experts who used our model confirmed its usefulness as 98% of survey questions scored ≥ 3/5. They also recommended it for its training (100% of scores ≥ 3/5; Supplementary Table S3).

Discussion
Main findings. We developed and validated a high-fidelity training model for fetoscopic SBA repair in live rabbits. We first demonstrated that competency of laparoscopic fetal surgeons was reached at 33 consecutives cases. That number is lower than what has been reported for non-laparoscopic fetal surgeons performing clinical multi-port fSBA-repair. Surveyed fetal experts also proved face and content validity.

Clinical interpretation.
A recent systematic review with available individual patient data on multi-port fSBA-repair demonstrated that the LC to reach competency was at least 56 cases 7 . This is of the same order of magnitude of other complex multi-port laparoscopic surgeries such as colectomy 41,42 or sacrocolpopexy 43 performed by surgeons without previous experience in these techniques. In our high-fidelity model the LC was 33 cases for novices, more precisely surgeons experienced in multi-port laparoscopic surgery but non-experienced with single-port surgery. That number is in keeping with other complex single-port procedures such as colectomy performed by surgeons experienced in multi-port laparoscopic surgery 44 . It is also similar to other advanced, yet less complex multi-port endoscopic procedures than those above, such as cholecystectomy 45 , pyloromyotomy 46 or the most common fetoscopic operation (laser coagulation for TTTS) 24 . They are described as less complex as they do not require suturing and extensive dissection (or in case of TTTS none at all) skills. We therefore surmise that the number of 33 reached in our simulator is the LC of surgeons experienced with complex multi-port laparoscopy before translation to clinical practice. It may represent an underestimation for surgeons non-experienced with complex multi-port laparoscopy. Clinically, the challenges of fetal surgery are greater than what can be simulated, such as the complex pathologic anatomy of the lesion, the frailty of human fetal tissue, the interference of fetal monitoring, the presence and vicinity of the placenta, the large number of Left-Handed 0 1 2

Number of fetal SB open repairs in humans
As

Number of training sessions on a fetoscopic surgery simulator
Virtual reality 0 ± 0 0 ± 0 0 ± 0 NA www.nature.com/scientificreports/ people and specialties around the operation table, or simply the stress of operating on two patients. These points were suggested by 6/26 (23%) of the survey participants (3/6 pediatric neurosurgeons, 2/9 pediatric surgeons and 1/11 obstetrician and gynecologist).

Strength and limitations.
We acknowledge a number of limitations to our study. First, some surveyed fetal surgeons raised concerns regarding the realism of specific aspects of our model. Indeed, the simulation procedure does not mimic the precise dissection and gentle manipulation of the dura, musculo-fascial flaps and fetal skin. However, manipulation, dissection and suturing of the stomach are quite comparable to these clinical steps, as the rabbit stomach can be easily damaged and perforated. This way, those steps unmask potential clinically relevant complications. Secondly, three laparoscopic fetal surgeons were involved in our fetal surgery training to reduce the number of animal required hence following the ethical standards of the NC3Rs-guidelines 18,19 . Our competency analysis allowed us to confirm our hypothesis about the competency level of novices and experts. Since experts 1 and 2 were previously novices 1 and 2, these surgeons became experts in three-port surgery in our rabbit training model-yet not for clinical fSBA-repair-following their training as novices in single-port surgery. Finally, we only tested our model for either a single-and three-port 23 . A two-port approach 26 currently practiced by some centers can easily be adapted.
Our study also has considerable strengths. Firstly, in the development of a simulator, we followed the consensus guidelines for animal research and validation of animal models 18,39 and surgical simulators 13,14,22 . Secondly, we applied robust methods for assessing subjective and objective validity. Thirdly, we measured the LC and competency level of both experts and novices with standardized methodology 38 . Finally, our observations seem clinically relevant as we come to numbers that are comparable to what has been demonstrated for complex clinical laparoscopic procedures performed by trained laparoscopic surgeons.

Conclusion
We developed and validated a high-fidelity model for fetoscopic layered SBA repair. It was used to determine the learning curve of laparoscopic fetal surgeons, which was in the range of other complex endoscopic procedures. We propose the use of this model to determine and shorten the learning curve of laparoscopic fetal surgeons, and aid retention of operative skills.