Patient–clinician brain concordance underlies causal dynamics in nonverbal communication and negative affective expressivity

Patient–clinician concordance in behavior and brain activity has been proposed as a potential key mediator of mutual empathy and clinical rapport in the therapeutic encounter. However, the specific elements of patient–clinician communication that may support brain-to-brain concordance and therapeutic alliance are unknown. Here, we investigated how pain-related, directional facial communication between patients and clinicians is associated with brain-to-brain concordance. Patient–clinician dyads interacted in a pain-treatment context, during synchronous assessment of brain activity (fMRI hyperscanning) and online video transfer, enabling face-to-face social interaction. In-scanner videos were used for automated individual facial action unit (AU) time-series extraction. First, an interpretable machine-learning classifier of patients’ facial expressions, from an independent fMRI experiment, significantly distinguished moderately painful leg pressure from innocuous pressure stimuli. Next, we estimated neural-network causality of patient-to-clinician directional information flow of facial expressions during clinician-initiated treatment of patients’ evoked pain. We identified a leader–follower relationship in which patients predominantly led the facial communication while clinicians responded to patients’ expressions. Finally, analyses of dynamic brain-to-brain concordance showed that patients’ mid/posterior insular concordance with the clinicians’ anterior insula cortex, a region identified in previously published data from this study1, was associated with therapeutic alliance, and self-reported and objective (patient-to-clinician-directed causal influence) markers of negative-affect expressivity. These results suggest a role of patient-clinician concordance of the insula, a social-mirroring and salience-processing brain node, in mediating directional dynamics of pain-directed facial communication during therapeutic encounters.

the rated patient-clinician therapeutic alliance prior to the MRI (See 2 for more comprehensive details on the clinical intake). However, due to the complex study structure of fMRI hyperscanning with online video transfer of full-facial view between scanners, we were not able to obtain video recordings of sufficient quality for artificial intelligence (AI)-based facial recognition and processing from all dyads. Complete and intact face video data were available for 21 individual patient datasets and 24 clinician datasets, resulting in 14 dyads with intact data for both the patient and the clinician. Due to this limited sample, data processing and statistical analyses were not pursued for contrasts between Clinical-Interaction vs. No-Interaction conditions.
Due to the lack of previous data on dynamic concordance, we could not estimate power using dyad-based metrics. Our pilot data from clinicians applying treatment for the evoked heat pain of a 'patient' confederate 3 , we found a mean Blood oxygen level-dependent (BOLD) percent change (within-subjects) for 'treatment' relative to 'no treatment' of 1.25±1.53 (mean±SD). An a priori power analysis (paired, two-tailed, =0.05) indicated that 15 subjects would be required for 85% power to detect this effect size (RStudio, function pwr.t.test, package pwr).

Experimental protocol
MRI-compatible video cameras enabled the participants to communicate non-verbally (e.g. eye movement and facial expressions) during the experimental hyperscanning runs. Each dyad completed two experiments, one in which the patient experienced cuff pain while the clinician observed (pain MRI run), and another experiment in which the patient experienced pain while the clinician "treated" the patient's pain with remotely controlled electroacupuncture (pain/treatment MRI run).
In the 'pain MRI' run the patient received a series of deep pressure pain stimuli to their left lower leg. Patients received 3 moderately painful (individually calibrated to 40/100 pain) and 3 innocuous (30 mmHg pressure) standardized pressure stimuli, of 15 s duration, while the clinician observed. Prior to each pressure stimulus, both participants were shown a visual cue to indicate whether the upcoming pressure stimulus (applied to the patient) would be painful or non-painful (6-12 s jittered, the frame around the partner's face changing color to red or green, respectively). Following each stimulus (4-10 s jittered), patients rated pain intensity (0-100 Visual Analog Scale, VAS, anchors: "No pain", "Most pain imaginable") using a MR compatible button box. Patients' facial data from this scan were used to train a machine learning model which aimed to discriminate facial expression patterns associated with Pain relative to Innocuous Pressure sensation (see "Statistical analysis").
In the 'pain/treatment MRI', the patient received a series of 12 moderately painful (individually calibrated to 40/100 pain) pressure stimuli to their left leg. During pressure pain, the clinician applied remote electroacupuncture (EA) treatment (pseudorandomized verum, sham, and overt No-Treatment, 15 s duration). Electroacupuncture was chosen as a treatment model since it enabled practitioners to apply a pain treatment relevant to their clinical practice, in an evoked-pain, block-design experimental paradigm suitable to the fMRI hyperscanning environment. Thus, this treatment model provided a more optimized balance of ecological validity and experimental control compared to other pain therapeutic (e.g. pharmacological) models. Importantly, for verum trials, EA was applied using a minimal sub-sensory threshold current level (0.1 mA) in order to avoid unblinding patients due to any sensory feedback from electrical stimulation. This electrical current level was also unlikely to have any significant physiological effect, and pain ratings during verum EA and sham EA were statistically equivalent 2 . Prior to each pressure stimulus, both participants were shown a visual cue (6-12 s jittered, frame around the partner's face changing color) to indicate whether-or-not the upcoming pressure pain would be treated (green) or not treated (red) by EA, in order to evoke anticipation for treated or non-treated pain for both patients and clinicians. Correspondingly, clinicians pressed and held either a 'treatment' button or a different 'no-treatment' button (i.e. matching motor preparation and execution across trials) for the duration of applied pressure pain (blue frame). The same instructions were given to both patients and clinicians: "feel free to use your face to express how you're feeling, as long as you keep your head as still as possible." Thus, both patients and clinicians were equally free to express, and respond to, the other's facial expressions. Following each stimulus, participants rated pain intensity (patients), vicarious pain (clinicians), and affect associated with the previous trial (patients and clinicians) using Visual Analog Scales.

Electroacupuncture stimulation
At the beginning of each MRI session, after the patient was positioned in the scanner, two needles (0.22 mm thick, 40 mm long MR-safe titanium, DongBang Acupuncture Inc, Boryeong, Korea) were inserted proximal to the cuff (2-3 cm depth, acupoints ST-34 and SP-10), and MRI-safe electrodes were attached to each needle. Due to hospital policy, while clinicians were encouraged to actively 'lead' the process, in line with their assigned role as their patient's practitioner, actual needle penetration was performed by a staff acupuncture practitioner under direct supervision by the subject clinician, evident to the patient. The electrodes were connected to an electronic needle stimulation device (2Hz, 0.1mA, AS Super 4 Digital, Schwa-Medico, Wetzlar, Germany), controlled by the computer running the experimental protocol.

Other Materials Cameras
Each MRI scanner was equipped with MRI-compatible cameras (Model 12M, MRC Systems GmbH, Heidelberg, Germany) attached to the table-mounted mirror, in order to enable on-line visual communication. Cameras were manually adjusted to capture the full face prior to scanning. The visual stream was projected onto a screen behind the MRI scanner bore, which the participants viewed through the table-mounted mirror. The two-way video stream (20 Hz) was transmitted over a local network (the cross-scanner delay was measured to be consistently < 40 ms) and recorded for the use of facial expression analyses.

Microphones
While verbal communication was disabled during scanning to avoid speech-related motion artifacts in the fMRI signal, participants were able to communicate verbally between different MRI scan runs. Speech was recorded using MRI-compatible optical microphones (Fibersound FOM1-MR, Micro Optics Technologies Inc., Cross Plains, WI, USA).

Software for stimulus presentation and signal synchronization
We applied in-house software (C++) for synchronization of fMRI and video signal acquisition between MRI scanners, transferring video and audio, and between-scanner network delay tracking. A laptop in each MRI scanner initiated fMRI scans using a remote trigger, and controlled the video stream, experimental visual stimuli, onset and offset of the leg pressure and EA stimuli, and recording in-scanner ratings and videos. The two laptops were connected through a Local Area Network. At the initiation of each fMRI pulse sequence, a signal from the master computer (patient MRI control room) was sent to the slave computer (clinician MRI control room). The current network delay (calculated as the mean of 10 network pings) was estimated, and the fMRI pulse sequences were then initiated locally adjusted for this network lag, thus ensuring synchronized acquisition timing of the two fMRI time series, video streams, and experimental protocols.

Statistical analysis
Discrimination of pain states and ranking of facial feature importance Facial AU timecourses from patients and clinicians during the pain MRI were used to train a nonlinear classifier, using XGBoost, a scalable end-to-end tree boosting algorithm which has shown state-of-the art performance in a number of diverse machine learning applications 4 .
Due to the high number of decision trees trained on bootstrapped subsets of our training dataset, this method is attractive as it is inherently resistant to overfitting. The video timecourses from all available dyads was split randomly into a training (70%) and test (30%) dataset, ensuring that randomization occurred both within and between similar design blocks to avoid possible information leakage across datasets due to correlation between temporally adjacent frames. In the training set, a genetic search pipeline with an evolutionary algorithm was used to optimize hyperparameter values and their combinations 5 in a 5-fold crossvalidation manner. After training, performance in the test set was assessed using an area under the curve (AUC) metric assessed from the receiver operating characteristic (ROC) curve. In order to estimate the unique contribution of each AU feature to the classification model, we These values, based on a game-theory framework, combine six existing methods for quantifying feature importance more consistently to human reasoning relative to previous machine learning explanatory approaches 7,8 .

Directed information flow of facial expressions between patients and clinicians
We employed video streams during the pain/treatment MRI to investigate how the clinician's facial expression affected the patient's facial expression, and vice versa, using 'Echo-State Granger Causality (GC)', a GC implementation based on recurrent neural networks with minimal trainable parameter count 9,10 . GC 11 is an umbrella term for statistical methods assessing whether, given two time-series, the information contained in one ('predicting') series is able to improve the prediction of the future of the other ('predicted') series compared to employing the past of the latter only. Typically, two independent models are employed to infer the 'predicted' time-series, where only one model contains explicit information about the 'predicting' times-series. The log-ratio between the residuals of the two models commonly represents the 'strength' of the directed causal connection between the time-series. 'Echo-State GC', which we employed here, has shown superior performance in detection causality between two (possibly non-linearly) coupled dynamical variables embedded within a multivariate system 9,10 .

Statistical analysis of echo-state Granger causality estimates
An important caveat inherent to Granger causality calculation between time series recorded from participants who perform the same series of tasks in a time-locked manner, is that observed causal relationships may also reflect aspects of the shared structure of the experimental paradigm rather than actual directed information flow -i.e. social interaction via facial expression communication. The null-hypothesis of no GC (i.e. zero information flow) between interacting partners' facial expressions should therefore only be rejected if GC estimates are higher than the "pseudo GC" contributed by the shared experimental environment and structure of the experimental paradigm such as stimulus presentation timing. Thus, in order to evaluate the statistical significance of our patient-to-clinician and clinician-to-patient GC estimates, we constructed empirical null distributions for each paradigm using simulated "dyads" of subjects.
Analyses using facial expression data from all possible patient-clinician combinations ('simulated dyads') in our dataset were generated, excluding patient-clinician dyads that actually occurred ('real dyads'). This amounts to calculating the distribution of GC strengths between all simulated pairs of individuals who followed the same experimental procedure but did not actually interact. GC estimate distributions for 'real dyads' were contrasted with these null distributions using non-parametric statistics.

MRI acquisition and preprocessing MRI acquisition
Blood oxygen level-dependent (BOLD) brain fMRI data were collected from each participant in the dyad (Patient scanner: Siemens 3T Skyra; Clinician scanner: Siemens 3T Prisma) using a whole brain, simultaneous multi-slice, T2*-weighted gradient echo-planar imaging pulse sequence (repetition time = 1250 ms, echo time = 33 ms, flip angle = 65˚, voxel size = 2 cm isotropic, number of slices = 75, Multiband acceleration factor = 5, 624 volumes split into 2 consecutive scan runs). Since the Siemens 3T Skyra has a slightly larger bore space compared to the Prisma, we decided to use this scanner for the fibromyalgia patient group, in order to maximize scanner comfort. Furthermore, keeping a designated "patient scanner" and "clinician scanner", rather than randomizing scanner assignment between dyads, improved protocol consistency within patient and clinician groups and facilitated the setup of our hyperscanning infrastructure.

fMRI preprocessing
Preprocessing of individual fMRI datasets was carried out using tools from FMRIB's Software Library (FSL, v6.0.0; www.fmrib.ox.ac.uk/fsl), and included the following steps: slice-timing correction, motion correction (MCFLIRT) 12 , correction of spatial inhomogeneity (TOPUP) 13,14 , nonbrain tissue removal (BET) 15 , spatial smoothing (full width at half maximum = 4mm), temporal high-pass filtering (f=0.011 Hz as computed by FSL's cutoffcalc), and grand-mean intensity normalization by a single multiplicative factor. For each subject, both runs were realigned (6 degrees of freedom) to a common reference space (7th volume of the first run) before the first-level GLM analyses. The transformation matrix for registration between functional and high-resolution anatomical volumes was calculated using Boundary Based Registration (bbregister, Freesurfer, v6.0.0 16 ). Two participants had one of their two fMRI pain/treatment MRI runs excluded from analysis due to excessive head motion, based on the following exclusion criteria: 1) >2˚ frame-by-frame head rotation in any direction, and 2) >2 mm frame-by-frame displacement. After excluding these data, mean head rotation was 0.05±0.02 (Mean±SD) and mean frame-by-frame displacement was 0.13±0.05 mm. An unpaired t-test indicated higher frame-by-frame displacement for patients (0.15±0.05) relative to clinicians (0.11±0.04, t=4.32, P<0.001), but there was no significant group difference for rotation (t=1.74, P=0.09). For registration from structural to standard space (MNI152), we used FSL's Linear registration tool (FLIRT, 12 degrees of freedom) 12,17 , followed by FSL's nonlinear registration tool (FNIRT) 18 . All single-subject analyses were performed in functional space, and then registered to MNI152 standard space before dyadic and group analyses.