Neuroanatomical correlates of forgiving unintentional harms

Mature moral judgments rely on the consideration of a perpetrator’s mental state as well as harmfulness of the outcomes produced. Prior work has focused primarily on the functional correlates of how intent information is neurally represented for moral judgments, but few studies have investigated whether individual differences in neuroanatomy can also explain variation in moral judgments. In the current study, we conducted voxel-based morphometry analyses to address this question. We found that local grey matter volume in the left anterior superior temporal sulcus, a region in the functionally defined theory of mind or mentalizing network, was associated with the degree to which participants relied on information about innocent intentions to forgive accidental harms. Our findings provide further support for the key role of mentalizing in the forgiveness of accidental harms and contribute preliminary evidence for the neuroanatomical basis of individual differences in moral judgments.


Supplementary Text S1: Scenario details
Scenario type by version breakdown. Red and green cells denote scenarios taken from previous studies (Cushman, 2008;Young, Camprodon, Hauser, Pascual-Leone, & Saxe, 2010) Note: The exact wording of the details can be found in the original papers or can be requested from the corresponding author. Italian translations are also available on request.

Supplementary Text S2: Component corresponding to the ToM network in gICA analysis
Why gICA was preferred over GLM for analyzing ToM task data?
For 7 out of 49 participants, MATLAB randomization for this task failed and the stimuli for each condition were thus shown in consecutive manner. This effectively turned our event-related design into a blocked-design. Additionally, since the average length of individual stimulus was 35 s, the length of each block was roughly 160s (including ITI). This made our design highly inefficient to find any task-related signal in the General Linear Model (GLM) focusing on ToM > control videos contrast. This is because of the dominant low-frequency noise, which overshadows signal in such long blocked-designs (optimal length: 16-50 s; Henson, 2007). Thus, to boost statistical power by utilizing the entirety of sample, gICA (n = 49) was preferred over the standard GLM-based (n = 42) analysis.

Brief summary of rationale behind gICA
The ICA was preferred for localizing functional network over other approaches (e.g., seed-based correlation analysis) because it provides many advantages over other univariate approaches to functional connectivity in terms of accounting for artefactual influence of confounding signals, such as respiratory, cardiovascular, non-grey matter, etc. (Cole, Smith, & Beckmann, 2010). The group-ICA method, as implemented in GIFT, involves following steps: initially data from all subjects are spatially normalized and dimensionally reduced by conducting PCA at individual subject level. All reduced datasets are then temporally concatenated to form one dataset on which group-ICA is applied. When applied, group-ICA decomposes a two-dimensional data matrix (with columns representing time course of voxels and rows representing different subjects) into two matrices, one corresponding to the time courses of components for the group and the other corresponding to spatial maps for components with component loading for each voxel. Individual-level components are then created via GICA back-reconstruction method based on PCA compression and projection.

Identifying the ToM component
We closely followed the analysis protocol detailed in a previous study (Hyatt, Calhoun, Pearlson, & Assaf, 2015) and we provide here extensive details about the preprocessing pipeline as data preprocessing can affect gICA results (Vergara, Mayer, Damaraju, Hutchison, & Calhoun, 2016 (Allen et al., 2011), it has been shown that artefactual components often exhibit both low DR and low fALFF and such components were removed from further analysis after visual inspection (Griffanti et al., 2016). Based on this analysis, 6 components were removed. Out of the 14 biologically meaningful and non-artefactual remaining components, the component corresponding to the ToM network was identified by using spatial correlation feature within GIFT, which identified the component (shown in figure below) with the highest spatial correspondence (correlation: r = 0.4152; multiple regression: β = 0.1567) to the ToM meta-analytic functional map (Mar, 2011). Redoing the same step with a different meta-analytic functional map for ToM studies (Bzdok et al., 2012) also revealed identical component (correlation: r = 0.3135; multiple regression: β = 0.0983).

Coordinates for the ToM component
Significant clusters of activation from the component corresponding to the ToM network 1 and anatomical labels derived using Anatomy toolbox (Eickhoff et al., 2005) are provided in the table below.
1 In passing, we note that our findings contest the claim made by a previous study that mPFC is not necessary for mentalizing (Otti, Wohlschlaeger, & Noll-Hussong, 2015). This study utilized the same task as we did, but did not find any activation in mPFC. Importantly, the study had only 20 participants. In the current study, we observed a robust activation in the mPFC both in model-free (n = 49) and model-based (n = 42) analysis (as did Moessnang et al., 2016). Thus, we maintain that "absence of evidence was not evidence of absence", i.e., study by Otti and colleagues failed to reject the null hypothesis in mPFC due to low statistical power stemming from their small sample size. For the sake of brevity, we provide some additional details about the preprocessing of the anatomical data and GLM modeling choices made during the VBM analysis.
• All images were inspected for the common scanner artifacts by the authors (with the help of following references: Graves & Mitchell, 2013;Stadler, Schima, Ba-Ssalamah, Kettenbach, & Eisenhuber, 2007) and for structural abnormalities by in-house physicians and technicians.
• During segmentation routine, intensity distributions for each tissue class was modeled using numerous Gaussians: two for GM, two for WM, two for CSF, three for bone, four for other soft tissues, and two for air (background).
• The images in native space were used for calculating total intracranial volume (TIV; by summing tissue volume for GM, WM, and CSF) with the help of Tissue Volumes Utility of SPM12, which has been shown to be a highly reliable way to compute TIV in both control and clinical populations (Malone et al., 2015;Ridgway, Barnes, Pepple, & Fox, 2011;Sargolzaei et al., 2015).
• Normalization was carried out using DARTEL toolbox because it has been shown to be a more sensitive approach to morphometry analyses than the standard and optimized VBM (Li et al., 2013). Note that since DARTEL, as implemented here, was used to create a study specific group template, it is important to note that characteristic of the group can affect the final individual normalized tissue maps (Michael, Evans, & Moore, 2016). This can create artefactual group differences, but since the current study did not feature such comparison it is immune to this criticism.
• Spatial smoothing was applied normalized GM in DARTEL space in order to -(i) validate choice of parametric tests, (ii) account for residual individual differences remaining from normalization (cf. Bookstein, 2001), and (iii) increase signal-to-noise ratio (Kurth, Luders, & Gaser, 2015) • Quality assurance review of the final smoothed GM images was performed using VBM8 toolbox.
Sample homogeneity was assessed using a covariance matrix and volumes with an overall covariance below two standard deviations were inspected further to ensure that there were no abnormalities in these volumes.
• For the regression models, none of the continuous covariates was mean centered as the contrasts testing for the average effect were irrelevant for the VBM analysis (cf. http://mumford.fmripower.org/mean_centering/). • To avoid activations lying outside of the brain (due to low variance problem; Ridgway, Litvak, Flandin, Friston, & Penny, 2012) and to increase power of FWE-correction by reducing analysis regions, we created a mask using the Masking Toolbox (Ridgway et al., 2009; http://www0.cs.ucl.ac.uk/staff/g.ridgway/masking/), which attempts to find an optimal threshold to binarize an average (GM) image based on correlation with the average image. Any voxel that fell outside of this mask was excluded from the analysis. This has been shown to be a more reliable approach (in terms of likelihood of false negatives) than using an arbitrary threshold (e.g., 0.2) to remove voxels with intensity below this value (Ridgway et al., 2009). • Recent work has begun to reveal that within motion scanner of even few mm per minute can severely affect morphometric estimates of GM (Alexander-Bloch et al., 2016), and, although prospective motion correction techniques have been introduced to account for such effects (Stucht et al., 2015), no such correction was implemented in the current study due to lack of necessary equipment (Maclaren, Herbst, Speck, & Zaitsev, 2013).
Additionally, time-of-day can affect all major tissue classes (GM, WM, CSF) such that apparent brain volume reduces from morning to evening (Nakamura, Brown, Narayanan, Collins, & Arnold, 2015), and this has greater impact on morphometric measures of frontal and temporal lobes (Trefler et al., 2016). Although we did not explicitly account for this effect, all participants were scanned during relatively fixed hours in the afternoon (15:00 to 19:00).
Thus, we acknowledge that in-scanner movement and time-of-day may have contributed to noise in our data, but we find a systematic biasing of our results to be unlikely.
• We note here that although we computed GMV using volume-based representation, the GMV computed using surface-based representation tends to be highly correlated with this method (Winkler et al., 2010). Additionally, it is interesting to note that variation in surface area and cortical thickness are two independent contributors to variation in GMV at both regional and local level, with surface area being the more significant contributor (Winkler et al., 2010). Thus, future studies can use surface-based morphometry techniques to investigate the same question. We would predict that a similar effect would be observed in l-aSTS: greater surface area would be associated with reduced moral condemnation for accidents.
• Why should we carry out VBM analysis? 2 One may wonder as to why we need to carry out VBM analysis ( measured with MRI. Although the same holds true for functional neuroimaging, the advantage provided by VBM is that they allow experimenters to link an individual's performance measured in an ecologically valid environment to brain structure measurement. Such assessment can be difficult to implement in MRI scanner settings in certain tasks, for example tasks that require participant to interact with multiple other individuals simultaneously, and thus can't be studied in an fMRI study (Camerer & Mobbs, 2017).
(ii) Since the heritability of a particular cognitive function (e.g., mental state reasoning) is contingent on the extent to which the respective brain structures (e.g., rTPJ) are influenced by genetic factors (Ge et al., 2016;Winkler et al., 2010), morphometry studies can be used to generate interesting hypotheses for multimodal studies linking function, structure, genes, and behavior.
(iii) Correlational data between GMV and performance can also be used to generate interesting hypotheses as to which brain regions might be important for the performance of the task. For example, in the current study, it was only after we found VBM effect at aSTS that we decided to carry out new analysis on our fMRI data to see if there was any correlation between functional activity in this region and performance. A further prediction can also be made for conducting neurostimulation studies (TMS, tDCS) whereby disrupting the function of aSTS (mental state reasoning) would lead to reduced task performance (increased condemnation of accidental harm cases).

Supplementary Text S4: Replicating the results with the GLM-based ToM mask
In this section, we show that even if we focus only on these 42 participants, we still get all primary regions of interest observed with gICA and also get VBM effects in the same region.

Replicating primary VBM results in the GLM-derived ToM mask
Here we focused only on the 42 participants for whom the MATLAB randomization did not fail and thus the General Linear Model (GLM) approach was appropriate for modelling the data (see main text). The ToM network at group-level was localized by entering beta-weights from canonical HRF contrasts from first-level in a one sample t-test. Whole-brain analyses were thresholded at p < 0.05, Family-wise Error (FWE) corrected at the threshold level (cluster-defining threshold: p < 0.05 (corrected), extent threshold: k > 10). The results were masked with meta-analysis map from (Mar, 2011). The results revealed the expected nodes of the ToM network, viz. bilateral temporoparietal junction (TPJ), sections of medial prefrontal cortex (mPFC), temporal poles (TP), superior temporal sulcus (STS), and precuneus (PC).

Note: The accompanying color bar in the figure below denotes t-values.
List of all coordinates along with labels derived from the Anatomy toolbox (Eickhoff et al., 2005) is provided below:

MNI coordinate Label (Anatomy toolbox) p (FWEcorrected) k
Same multiple regression models used in the main text to explore GMV and moral condemnation association on voxel-level were used, but with one crucial difference: for image-based small volume correction the ToM mask used was from the GLM-analyzed localizer data instead of gICA-analysed.
The results again revealed that more severe moral condemnation for accidental harm condition was associated with reduced GMV in the left aSTS/MTG (x = -60, y = -12, z = -14; β = -0.0268; k = 17; p(FWE-corrected) = 0.0116). No such activation was found in the right aSTS/MTG. Additionally, no such result was found for any of the other conditions. Thus, the same result is obtained irrespective of whether we used a functional mask derived from GLM-based analysis of the ToM localizer task or gICAbased. VBM results at rTPJ: Although no effect was observed for any condition between the grey matter volume in rTPJ with moral judgment in any condition even at a more liberal, uncorrected threshold of p < 0.001.

Supplementary Text S5: Descriptive statistics for moral condemnation
Below is a scatter plot illustrating the negative linear associations between GMV in rTPJ (ρ(47) = -0.316, p = 0.027, n = 49, two-tailed) and the severity of moral condemnation of accidental harm at a highly liberal threshold (p < 0.05 (uncorrected)) for a comparison with Figure 2 in the main text. The solid lines indicate a linear fit to the data, while the curved lines represent mean 95% confidence intervals for these lines. Extracted grey matter volume data presented in figures are non-independent of the statistical test conducted and should not be used for effect-size estimates (Vul & Pashler, 2017). They are included here only as a visual aid for interpretation of results.
fMRI results at rTPJ: For ROI analysis, the data from spherical ROIs with a radius of 8 mm was extracted from rTPJ at [56, -56, 20] in MarsBar. This analysis revealed that there was indeed a negative correlation between parameter estimate during acceptability segment and reduced condemnation for accidents, but this effect was not significant (ρ(40) = -0.213, p = 0.088, n = 42, one-tailed).
Discussion: Why the predicted effect was not observed in the rTPJ?
As noted in the main text, we did not find any effect at the rTPJ 3 , which is surprising given the amount of research that places rTPJ at the center of morally relevant mental state reasoning (reviewed in Chakroff & Young, 2015). As argued in the main text, this can be due to the different aspects of ToM subserved by different regions (l-aSTS versus rTPJ) such that the aspect relevant for shaping the brain-structure relationship is neurally grounded in l-aSTS. Nevertheless, we entertain other mutually compatible explanations for this null effect below.
One plausible explanation is that the underlying assumption about equivalence between structural and functional relationship is misplaced when it comes to mental state attribution. We would expect the VBM effect at rTPJ based on a previous functional data (Chakroff et al., 2016;Young & Saxe, 2009).
Functional imaging studies deal with short-lived brain-behavior association while structural imaging deal with long-term brain-behavior associations and it is possible that for some regions there is no homologous 3 It is noteworthy that morphometry studies exploring neuroanatomical correlates of ToM skills have found a wide variety of regions, some usual suspects, like TPJ and dmPFC (Rice & Redcay, 2015;Valk et al., 2017), while others that are not as extensively discussed in the ToM literature, e.g., anterior temporal lobes (Irish, Hodges, & Piguet, 2014), right inferior frontal gyrus (Rice & Redcay, 2015), ventrolateral prefrontal cortex (Hirao et al., 2008), amygdala (Rice, Viscomi, & Riggins, 2014), middle temporal gyrus mapping of functional and structural variation, i.e., functional activation differences may (aSTS) or may not (rTPJ) be reflected in structural differences (and vice versa), at least at the macroscopic level (Kanai & Rees, 2011;Lewis, Kanai, Bates, & Rees, 2012). An important and unresolved issue in VBM research is determining which functions are linked to structural differences (Kanai & Rees, 2011); a systematic and principled investigation is in order.
Another possible explanation involves our choice of ToM functional localizer task. Note that while the localizer task featuring social animations recruits spontaneous mental state attribution to interacting triangles, the moral judgment task required false belief reasoning. Thus, there may have been a mismatch between the assumed functional role of the rTPJ across the two tasks. Indeed, our choice of localizer task Yet another possibility is that no effect at aSTS was found in previous functional studies because most of the prior investigations have focused on few ROIs (dmPFC, vmPFC, bilateral TPJ, PC), while ignoring the rest of the ToM network (cf. Chakroff et al., 2016;Koster-Hale, Saxe, Dungan, & Young, 2013;Young & Saxe, 2008). The current investigation was unbiased in this sense, since we interrogated the entire ToM network localized using gICA and did not focus only on the few key nodes of the network.
In passing, we also note that previous morphometry studies examining how regional variation in brain structure relates to individual differences in endorsed moral values (Lewis et al., 2012), moral reasoning skills (Prehn et al., 2015), prosocial behavior (Marsh et al., 2014;Thijssen et al., 2015;Yamagishi et al., 2016), and moral judgments in clinical populations (Baez et al., 2015(Baez et al., , 2016 have not found any correlations with GMV or cortical thickness of TPJ region. To our knowledge, only study has found any effect in this region: a positive correlation between rTPJ and altruistic decision making (Morishima, Schunk, Bruhin, Ruff, & Fehr, 2012).