The neurocognitive gains of diagnostic reasoning training using simulated interactive veterinary cases

The present longitudinal study ascertained training-associated transformations in the neural underpinnings of diagnostic reasoning, using a simulation game named “Equine Virtual Farm” (EVF). Twenty participants underwent structural, EVF/task-based and resting-state MRI and diffusion tensor imaging (DTI) before and after completing their training on diagnosing simulated veterinary cases. Comparing playing veterinarian versus seeing a colorful image across training sessions revealed the transition of brain activity from scientific creativity regions pre-training (left middle frontal and temporal gyrus) to insight problem-solving regions post-training (right cerebellum, middle cingulate and medial superior gyrus and left postcentral gyrus). Further, applying linear mixed-effects modelling on graph centrality metrics revealed the central roles of the creative semantic (inferior frontal, middle frontal and angular gyrus and parahippocampus) and reward systems (orbital gyrus, nucleus accumbens and putamen) in driving pre-training diagnostic reasoning; whereas, regions implicated in inductive reasoning (superior temporal and medial postcentral gyrus and parahippocampus) were the main post-training hubs. Lastly, resting-state and DTI analysis revealed post-training effects within the occipitotemporal semantic processing region. Altogether, these results suggest that simulation-based training transforms diagnostic reasoning in novices from regions implicated in creative semantic processing to regions implicated in improvised rule-based problem-solving.


Subjects.
A total of 20 healthy participants (11 females, mean age: 25.65, age range:  were invited twice to the Center of Cognitive Neuroscience (CCNB) at the Free University Berlin (FUB) to have their pre-and post-training MRI scans. All participants were right handed and had no history of neurological and psychiatric diseases or medications. All sections of the experiment were performed in accordance with the guidelines in the Declaration of Helsinki. The CCNB review board and the ethics committee of FUB approved all procedures. All participants provided written informed consent before MRI scans and were paid 70 € for their participation.
Experimental design and procedure. Following the designs of cognitive training studies 35,36 , our study encompassed three phases: (1) the pre-training phase (day 1); (2) the home training phase (days 2-6); and (3) the post-training phase (day 7). The pre-training phase started with inviting participants to have their first MRI scan at CCNB. After receiving information about MRI safety requirements and signing a consent form, participants started their MRI scan with a 5 min structural MRI (sMRI), followed by 20 min task-based functional MRI (fMRI), 10 min eye-opened resting-state MRI (rsMRI) and a final 15 min diffusion tensor imaging (DTI). The home training phase started on the next day and for another 4 consecutive days, during which participants played with EVF (see Supplementary Fig. 1) for a minimum of 1 hour per day. To ensure that participants adhered to this training regimen, they were asked to save their work progress and send their saved files (XML files) by email daily. Those XML files included all the laboratory procedures they achieved within the virtual laboratory as well as their diagnostic and therapeutic hypotheses for the horses. Based on their performance, additional tips and reinforcement emails were sent to each participant to keep them on track. After 5 consecutive days of training, each participant was invited again to CCNB to have their last post-training scan and all participants were reimbursed www.nature.com/scientificreports www.nature.com/scientificreports/ for their participation. Both pre-and post-training scans were similar in terms of acquisition parameters and sequences.
During fMRI, each participant played a veterinarian within EVF for 20 minutes, using a trackball mouse (Fig. 1e). Within those 20 minutes, participants interacted with five horses in the horse yard and examined their physical performance on treadmill and during feeding, drinking and walking. Next, they started collecting and preparing blood samples from horses to perform laboratory tests, followed by writing a laboratory report. And, to reach a final diagnosis, EVF was provided with an office, where participants can read interactive books and scientific papers on a virtual laptop. Because EVF was developed as a stand-alone desktop application, C++ script was implemented into MATLAB to generate scanner-synchronized and accurate presentation of EVF (Fig. 1a,c,d), alternating with static visual stimuli (Vis) (Fig. 1b). Each participant played through a total of 24-time blocks: twelve 80-sec EVF blocks, and twelve 20-sec Vis blocks. To meet the study aims of investigating the neural correlates of analytic reasoning, another version of EVF was developed for the second post-training scan. In that new version, problem-based scenarios (3D animations and positions) were modified and assigned to different horses. These modifications aimed for providing uncertainty conditions, allowing participants to activate their analytic reasoning strategies. Image acquisition. MRI data were collected using a Siemens Vision 3-T Tim Trio scanner (Siemens, Erlangen, Germany) with a standard 12-channel head coil. Anatomical data (sMRI) was acquired using the MPRAGE sequence (TE = 2.52 ms, TR = 1,900 ms, TI = 900 ms, flip angle = 9°, FOV = 256 mm, voxel size = 1 × 1 × 1 mm 3 , 176 sagittal slices). An Echo planar imaging (EPI) sequence (TE = 30 ms, TR = 2,000 ms, flip angle = 70°, voxel size = 3 × 3 × 3 mm 3 , FOV = 192, 37 interleaved axial volumes) was performed for acquiring both task-based (fMRI) and resting-state (rsMRI) data. Finally, Diffusion tensor imaging (DTI) was acquired using an EPI-based single-shot spin-echo diffusion sequence (mz_ep2d_diff_free; TE = 94 ms, TR = 10000 ms, voxel size = 2 × 2 × 2 mm 3 , FOV = 208, phase FOV = 100%, 69 transversal slices, Phase Partial Fourier = 6/8 flip angle = 90°, b-value = 1000 s/mm2, bandwidth = 1602 Hz/Px, echo spacing = 0.69 ms, EPI Factor = 104, diffusion directions = 61). Distortion correction was implemented using a 3 min point spread function (psf) calibration scan (mz_ep2d_psf) with the same previous diffusion acquisition parameters 37 . Motion was corrected during reconstruction using a previously acquired reference scan. Total scanning time was ~1 hour (50 min) encompassing one sMRI, one fMRI, one rsMRI and two DTI scans (mz_ep2d_psf and mz_ep2d_diff_free). Cushions were placed around the head to minimize head movements and stimuli were presented through a mirror mounted on the head coil.
Image preprocessing. Functional MRI Preprocessing: Images from fMRI, sMRI and rsMRI were preprocessed using SPM12 (Wellcome Trust Center for Neuroimaging, UCL). For each participant, preprocessing started with the realignment of functional images using rigid-motion transform, followed by slice-timing correction. Each structural T1 image was normalized to MNI space, using the unified segmentation and normalization in SPM12, and co-registered to the mean functional image. Then, the estimated parameters of these transformations (i.e. co-registration and normalization) were used to normalize functional and resting-state images (fMRI and rsMRI), which were then smoothed using a Gaussian smoothing kernel (FWHM = 6 mm).
Voxel-based morphometry: A voxel-based morphometry (VBM 38 ) analysis was performed with the Computational Anatomy Toolbox (CAT-12) to investigate the voxel-wise grey matter changes before and after EVF training. CAT-12 default settings were used to segment T1-weighted MRI data into gray matter (GM), white matter, and cerebrospinal fluid, followed by their spatial normalization using the DARTEL template into MNI space. Next, the preprocessed GM data were smoothed using a 6 mm FWHM Gaussian kernel. www.nature.com/scientificreports www.nature.com/scientificreports/ Brain activity analysis. Functional MRI Analysis: For each subject, both fMRI and rsMRI data were modelled using the General Linear Model (GLM). For fMRI data, the GLM included two predictors for EVF and Vis conditions and 6 nuisance regressors for the realignment parameters. In the first-level analysis four contrasts were specified: 1) EVF > baseline [1 0]; 2) Vis > baseline [0 1]; 3) Vis > EVF [−1 1]; 4) EVF > Vis ; and t-and beta-maps were generated for each contrast per subject. On the other hand, the GLM of rsMRI included realignment parameters only as six nuisance regressors and no contrasts. Next, between-group analyses were performed on fMRI data using second-level random-effects analysis. Second-level analysis started with performing one sample t-test on EVF > Vis contrasted beta-maps from pre-training (PRE) and post-training (POST) groups, separately. Then, paired t-tests were performed to compare EVF > Vis contrasted beta estimates between PRE and POST groups. Between-group analysis generated two t-maps: 1) pre-vs. post-training (PRE > POST), and 2) post-vs pre-training (POST > PRE). Activation clusters were regarded as significant at P < 0.05 FWE-corrected (family-wise error corrected for multiple comparisons) and cluster size k ≥ 10.
Brain connectivity analysis. Functional Connectivity Analysis: Task-based (f-) and resting-state (rs-) connectivity analysis started with extracting the mean time-series of 120 anatomically-defined regions (VOIs) (see Supplementary Table 1), using the "Volume of Interest" batch function in SPM12 Utilities. GLM adjusted BOLD responses represented both task-based and resting-state time-series, but task-based time-series were further corrected for EVF > Vis contrast at each voxel (see Supplementary Equations 1-3). Mean time-series were estimated by averaging the voxels composing each anatomical region at each time point, using Neuromorphometrics probabilistic atlas masks (http://www.neuromorphometrics.com). Then, Pearson's correlation coefficients were computed for each pair of time-series, resulting in the generation of 4 adjacency matrices (2 time-series classes [fMRI and rsMRI] x 2 training sessions) per participant (see Supplementary Fig. 2a). To approach a normal distribution, Fisher's r-to-z transformation was applied to the correlation coefficients, using MATLAB's "atanh" function. (see Supplementary Fig. 2b). Next, the generated adjacency matrices were represented for each subject as undirected graphs G (V,E), where the nodes (V) were the VOIs and the edges (E) were the absolute values of the z-transformed correlation coefficients (weighted edges) 39 . For each graph, the total number of nodes (V) were 120 and the total number of edges (E) were 1\2 V(V-1) (the upper triangular subset of adjacency matrix; see Supplementary Fig. 2b). Graph centrality metrics, including degree, eigenvector, closeness, betweenness and PageRank 40 , were then computed for each region (node), using graph centrality functions in MATLAB 2017 (https://de.mathworks.com/help/matlab/ref/graph.centrality.html). To explore the training-associated transformations in brain region centralities, linear mixed-effect models (LME) were fitted for each graph centrality measure, using the "fitlme" function in MATLAB 2017 (https://de.mathworks.com/help/stats/fitlme.html). For each centrality (C) measure, a linear mixed-effect model (LME) was fitted for each time-series class (i.e. task-based or resting-state), separately, where the centrality metric (C) was the response variable and the interaction between training sessions (T) and VOIs (V) were the fixed effects. Both T and V predictor variables were presented as dummy variables with values indicating their corresponding categorical levels across 20 (subjects) x 117 (VOIs) x 2 (Training) centrality observations. To account for both between-and within subject variance components, LME intercepts were allowed to randomly vary across subjects (S), VOIs grouped by subjects (S:V), or training sessions (T) grouped by VOIs and subjects (S:V:T) ( Table 1). Based on previous studies [41][42][43] , accounting for these mixed random effects will accurately model the true variability and offer superior statistical power in detecting longitudinal group differences. The resulting LME models were fitted using maximum likelihood estimation (ML) with resting-state, white matter and pre-training set as reference levels (i.e. coefficients set to zero). The random effect structure for each model was independently determined by model comparisons using likelihood ratio test (LRT). LRT compared the observed likelihood ratio (LR) statistic of the compared models with its chi-squared reference distribution and the best fitted models were identified with their appropriate random effects based on "smaller-is-better" Akaike information criterion (AIC) (see Supplementary Table 2). Because the present study aimed for exploring the different central roles of VOIs across training sessions (PRE and POST), significant interaction effects (coefficients) between individual VOIs and training sessions were reported at P < 0.05 and visualized using the BrainNet Viewer software (www.nitrc.org/projects/bnv).
Structural Connectivity Analysis: Tract-Based Spatial Statistics (TBSS) preprocessing and analysis were performed on the generated DTI parameters maps in FSL. TBSS preprocessing started with the nonlinear alignment of all participants' FA maps to a standard space (FMRIB58_FA), followed by their affine transformation into MNI152 space. Then, a mean FA image was generated and thinned to create a mean FA skeleton from all subjects' FA standard-space images. Finally, each participant's FA, MD, L1, L2 and L3 maps were projected onto the mean FA skeleton after applying a FA threshold of 0.1. The resulting projected maps were then used for mk mi  ni  ni  mkin  0  2  2  1  2  117  2  2  2  2  117 3 0 mk mi  nik  n ik  mkin  0  2  2  1  2  117  2  2  2  2  117 3 0 (1c) Table 1. LME models with different random effects. C = Centrality response variable and the fixed-effects for all models are T and V, T = dummy variable representing 2 training groups categories, V = dummy variable representing 117 VOIs categories (4 white-matter regions were set to the same category, hence 117 instead of 120), S = subjects variable, m = 1,2,3…,4800 (centrality observations indices = 20 subjects x 120 VOIs x 2 training groups), k = 1,2 (T categories indices), i = 1,2,3 … 117 (V categories indices). S*V = intercept variance across V grouped by S, S*V*T = intercept variance across V grouped by T and S. www.nature.com/scientificreports www.nature.com/scientificreports/ TBSS group analyses. To identify significant differences in DTI parameters between PRE and POST groups, voxel-wise paired t-tests were performed on FA, MD, L1, L2 and L3 maps using the nonparametric FSL permutation tool "randomize" (http://www.fmrib.ox.ac.uk/fsl/randomise). To correct for multiple comparisons, the contrasts (PRE > POST, POST > PRE) were analyzed with 5000 random permutations using threshold-free cluster enhancement (TFCE) with a significance threshold of P < 0.05 FWE-corrected.
Voxel-based morphometry. To identify the significant differences in grey matter density between PRE and POST groups, volume differences were first assessed for PRE and POST groups using a voxel-wise GLM analysis. Then, paired t-tests were performed to compare GLM beta estimates between PRE and POST groups. Clusters were considered significant at P < 0.05 FWE-corrected and cluster size k ≥ 10.

Results
Behavioral results. 19 of 20 participants successfully completed their functional MRIs and training sessions; one participant successfully completed all other requirements but did not have a pre-training DTI scan. Posttraining MRI revealed significant improvement in participants' laboratory and diagnostic skills. The number of solved cases was significantly higher (p < 0.016, t (19) = 2.65, SD = 0.76) during the post-training session. However, only 35% of the participants managed to submit an accurate diagnosis for at least one case during their 20-min post-training MRI session. On the other hand, the remaining participants needed more time to gather additional information for corroborating their unresolved premature diagnosis. Most of those remaining participants highlighted after their MRI sessions that despite they knew the correct diagnosis, they needed to test all their hypotheses before submitting their final report. Accordingly, accounting for the reported behavioral results in brain activity and connectivity models was not performed because it was hypothesized that they might lead to faulty neural correlates assumptions.
Functional activity analysis. Using random-effects analysis, comparing pre-training with post-training sessions (PRE > POST) for EVF > Vis contrast revealed significant activation clusters within the left inferior frontal gyrus (L-IOpFG) and inferior/middle temporal gyrus (L-ITG/MTG). Both clusters overlapped with middle frontal gyrus (MFG) and middle temporal/fusiform gyrus (MTG/FuG), respectively. On the other hand, the reverse contrast (POST > PRE) showed significant cluster activity in the right cerebellum, middle cingulate (R-MCG) and medial superior frontal gyrus (R-MSFG) and left postcentral gyrus (L-POG). (Table 2, Fig. 2).     Table 2). For both task-based (fMRI) and resting-state (rsMRI) centrality metrics, comparing LME models showed model (1a) fitting significantly better than models (1b) and (1c) for degree and closeness; whereas model (1b) was the best fitted model for eigenvector and PageRank. On the other hand, model (1b) was the best model that explained the variance in betweenness data for fMRI, while model (1c) was the best model for rsMRI. Table 3 and Fig. 3A,B focus on the significant interaction between the individual VOIs categories (V) and the training sessions (T) in fMRI and rsMRI. For each centrality measure, regions (V) with significant training (T) interaction coefficients were reported with their related t-values at P < 0.05. Within task-based brain network (fMRI), inferior frontal gyrus (IOpFG), angular gyrus (AG), MFG, parahippocampus (PHG) and orbital gyrus (OrFG) showed significant negative post-training effects (negative coefficients) for eigenvector. Similarly, PageRank revealed negative post-training interactions for eigenvector regions as well as superior temporal gyrus (STG) and frontal operculum (FO). In addition, degree models' coefficients showed only IOpFG with negative post-training effects. Lastly, for betweenness, PHG and medial postcentral gyrus (MPOG) showed positive post-training interactions, whereas the accumbens area, frontal pole (FP), putamen, amygdala and cerebellum showed post-training negative effects (Fig. 3A). Alternatively, within resting-state brain network, eigenvector and PageRank LME modelling revealed the fusiform gyurs (FuG), frontal pole (FP) and inferior occipital gyrus (IOG) with positive post-training interactions.  Voxel-based morphometry (VBM). T1-weighted MRI data were analyzed using VBM comparing PRE and POST training groups. VBM analysis revealed no significant differences in GM volumes between PRE and POST groups at cluster size k ≥ 10. However, at cluster size k = 1, POST training groups showed significant increases in GM within left and right nucleus putamen, caudate and accumbens as well as left thalamus, left Entorhinal area, right anterior insula and right hippocampus.

Discussion
The present longitudinal study ascertained training-associated transformations in novices' whole-brain activity and connectivity during diagnostic reasoning, using a simulation game known as Equine Virtual Farm (EVF). Comparing the primary contrast (EVF > Vis) of playing veterinarian (EVF) versus seeing a colorful stationary image (Vis) across training sessions revealed significantly greater activations in the left inferior/middle frontal gyrus (L-MFG/IOpFG) and inferior-or middle temporal/fusiform gyrus (L-ITG/MTG/FuG) during pre-training session (PRE > POST); whereas, the right cerebellum (R-Cerebellum), middle cingulate (R-MCG) and medial superior frontal gyrus (R-MSFG) and left postcentral gyrus (L-POG) showed significantly increased post-training engagement (POST > PRE). Further, graph-based functional connectivity analysis, using LME modelling of centrality metrics, revealed IOpFG, MFG, orbital (OrFG), angular gyrus (AG) and parahippocampus (PHG) with significantly higher connectivity (i.e. eigenvector and PageRank) during pre-training diagnostic reasoning; whereas the superior temporal gyrus (STG) was the only post-training highly connective node. Moreover, LME modelling of betweenness centrality metric showed the central roles of the dopaminergic system (nucleus accumbens and putamen), amygdala and cerebellum in driving pre-training diagnostic reasoning processes; while the PHG and medial postcentral gyrus (MPOG) were the main post-training mediating hubs. On the other hand, resting-state connectivity analysis, using the same task-based LME modelling approach on centrality metrics, revealed higher post-training connectivity for the inferior occipital (IOG) and fusiform gyrus (FuG) and frontal pole (FP). And lastly, structural connectivity analysis using tract-based spatial statistics (TBSS) seems to show diagnostic reasoning training inducing significant structural changes in the occipitotemporal part of middle temporal gyrus (MTG). As hypothesized, these findings suggest the transition of novices' whole-brain activity and connectivity during diagnostic reasoning from regions implicated in creative semantic processing (MFG/IOpFG, MTG and AG) 44,45 before training to regions implicated in improvised rule-based problem solving (cerebellum and POG, MSFG and MCG 17,46-48 ) after training.
The activations of L-MFG/IOpFG have been consistently shown in neuroimaging studies of creative semantic cognition. For instance, Beaty et al. 45 have revealed the involvement of both regions in generating unstudied (low-constraint) and novel (high-constraint) semantically related words. In addition, Zhou et al. 46 found that searching for numerical relations among conceptual knowledge while solving mathematical problems elicits greater activations in the L-IOpFG, AG, MTG, FuG, PHG, MSFG and posterior cingulate gyrus (PCG). According to Binder et al. 44 , these seven regions were identified to form the semantic network system. Moreover, on comparing artistic and scientific creativity, Shi et al. 49 found a positive correlation between the gray matters of the L-MFG and IOG and scientific creativity, emphasizing the crucial role of semantic reasoning in scientific rather than artistic achievements. Also the significant activities observed in the left MTG and FuG were consistent with a recent meta-analysis study 50 , where the activities of the left MTG and FuG were attributed to creative ideation during semantic divergent thinking tasks. Thus, the activation pattern observed in the present study within the IOpFG, MFG and FuG suggest that novices relied exclusively on creative semantic processing to explore and reason about the novel relationships among medical concepts, objects (e.g. laboratory equipment) and events (e.g. blood sampling).
Further, modelling graph centrality metrics using LME analysis has extended brain activity findings in the present study and emphasized the central roles of semantic network as well as reward systems in diagnosing novel simulated cases. Specifically, three out of seven semantic network regions 44 , including IOpFG, AG and PHG, showed higher connectivity (i.e. eigenvector and PageRank) to other brain regions. In addition, the higher connectivity of the orbitofrontal cortex (I-, M-and L-OrFG) suggest that novices relied continuously on monitoring and evaluating the reward values of EVF's various reinforcers 51 . Moreover, the evident increases in betweenness centrality metric within the dopaminergic system (nucleus accumbens and putamen), amygdala and cerebellum suggest the incessant contribution of the amygdala, putamen and nucleus accumbens in maintaining the balance between cognitive stability (e.g. inhibiting prepotent responses) and cognitive flexibility (e.g. reward-oriented switching between different options, reversal learning) 52,53 , while the cerebellum repeatedly simulate proper learning behaviors for providing spontaneous improvisation 54 . Thus, besides semantic processing, novices seem to engage affective reward processing and improvised creativity in solving novel problems through trial-and-error.
Alternatively, having experience in diagnosing diseases seems to shift brain activity and connectivity from creative semantic processing to insight problem solving. Based on previous neuroimaging studies 47,48 , the activities in MSFG, POG and cerebellum were found to be associated with a distinct type of mental preparation that leads to successful insight problem solving (spontaneous problem solving, "Aha" experience). Moreover, Crescentini et al. 55 found increased activations in STG, cingulate gyrus (CG), cerebellum, putamen and precentral gyrus for rule following during inductive reasoning. These findings seem to connect the observed post-training activations in the cerebellum, POG, MSFG and MCG to rule-based problem solving. Further, the higher connectivity (PageRank) of STG and the increased betweenness of the MPOG emphasize the central roles of semantic priming and rule following in recognizing solutions quickly (insightful problem solving) 48,56 . Thus, according to the dual process theory 15 , it is evident that participants engaged the intuitive (associative) and analytical (rule-based) reasoning systems in diagnosing EVF's swapped simulated cases in EVF new version after training. Also, it is worth noting that in the present study, prefrontal activations shifted with experience from left to right hemispheres. These distinct hemispheric activations were consistent with recent neuroimaging research on clinical decision-making 22 that connected the left hemisphere to semantic processing and the right hemisphere to episodic memory retrieval 57 . These studies explain further the observed engagement of PHG, a region that has been consistently associated with visuospatial processing and episodic memory 58 . And intriguingly, it is noteworthy that the observed pre-and post-training brain activities were found to be closely similar to the activation pattern implicated in visuo-spatial creative problem solving (MFG, IOpFG, MCG and MSFG) 59 . This later finding seems to point IOpFG and MFG to creativity and connect MCG and MSFG to problem-solving. However, future studies will be needed to confirm this indication.
Finally, although the post-training changes observed in resting-state and structural connectivity were unexpected, they were consistent with recent neuroimaging research on scientific creativity, which implicated IOG and MTG in scientific semantic processes 60 and insight problem solving 61 . However, a potential limitation might arise from the possibility that these changes might not fully account for EVF training due to the lack of a control group in the present study design.
Taken together, the present study used a novel simulation-based paradigm to study the training-associated transformations in the neural underpinnings of diagnostic reasoning. The present results extend previous neuroimaging studies by clarifying the contribution of semantic processing and insight problem-solving to creative diagnostic reasoning. Moreover, the distinct neural foundations observed in experienced versus inexperienced novices revealed how simulation-based training can shift diagnostic reasoning from creative to rule-based cognitive processes. This neural difference emphasizes the importance of maintaining novelty and challenge within medical training environments for improving the creativity of medical practitioners.
Further, through the LME analysis of functional connectivity, this study showed the contribution of affective (amygdala) and reward-based (nucleus accumbens, putamen, orbital gyrus) processing in driving creative reasoning (middle frontal gyrus and, inferior frontal gyrus) and the engagement of semantic processing (parahippocampus) and insight problem solving (superior temporal) in mediating analytical reasoning. However, whereas resting-state and structural connectivity analysis revealed potential post-training effects within regions implicated in scientific semantic processing, future research should replicate these findings in the presence of control groups.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.