Meta-analysis of neural systems underlying placebo analgesia from individual participant fMRI data

The brain systems underlying placebo analgesia are insufficiently understood. Here we performed a systematic, participant-level meta-analysis of experimental functional neuroimaging studies of evoked pain under stimulus-intensity-matched placebo and control conditions, encompassing 603 healthy participants from 20 (out of 28 eligible) studies. We find that placebo vs. control treatments induce small, widespread reductions in pain-related activity, particularly in regions belonging to ventral attention (including mid-insula) and somatomotor networks (including posterior insula). Behavioral placebo analgesia correlates with reduced pain-related activity in these networks and the thalamus, habenula, mid-cingulate, and supplementary motor area. Placebo-associated activity increases occur mainly in frontoparietal regions, with high between-study heterogeneity. We conclude that placebo treatments affect pain-related activity in multiple brain areas, which may reflect changes in nociception and/or other affective and decision-making processes surrounding pain. Between-study heterogeneity suggests that placebo analgesia is a multi-faceted phenomenon involving multiple cerebral mechanisms that differ across studies.

: clusters of placebo-treatment induced reduction in pain-related activityfull sample, random effects analysis 47 Supplementary Table 11A: clusters of placebo-treatment induced increase in pain-related activityfull sample, fixed effects analysis 48 Supplementary Table 11B: clusters of placebo-treatment induced reductions in pain-related activity -full sample, fixed effects analysis 49 Supplementary Table 12: clusters showing a significant negative correlation between brain activity and behavioral placebo analgesia -full sample (sans between-subject studies), random effects analysis 50 Supplementary References 51

Study identification
Study identification procedures have previously described in the Supplement of Zunhammer et al. (2018) 1 and are repeated below for convenience: Criteria for study eligibility Studies were defined eligible if… a) …published in a peer-reviewed journal in English language b) …based on an original investigation c) …including human participants d) …obtaining functional neuroimaging data of the brain during evoked pain e) … involving pain delivered under stimulus intensity-matched placebo and control conditions, where "Placebo treatment" was defined as any condition where the experimental context suggested that an effective analgesic treatment was applied, including verbal suggestions and conditioning procedures that reinforced participants' expectations of reduced pain, following the categorization of placebo paradigms introduced in Ref. 2 . Accordingly, non-placebo control conditions that involved no treatment, ineffective treatment, hidden (in contrast to open) treatment, and unconditioned (in contrast to conditioned) treatment were considered eligible.

Original study identification
Studies were identified through the following sources: a) an initial online-search of the electronic bibliographic database MEDLINE via PubMed on May 21 st 2015 using the search term:

((placebo effect[Title/Abstract]) OR placebo analgesia[Title/Abstract]) AND fMRI OR PET. b) by enriching initial search results with studies identified in an earlier meta-analysis of author TW 2,3 .
Search results in these preceding peak-voxel-based meta-analyses were obtained by "identified using literature searches in PubMed and Google Scholar, the authors' personal libraries, and examining references of relevant papers." c) through recommendations by collaborating investigators. Studies identified are listed in Supplementary Table 1, the data-acquisition process is illustrated in Supplementary Figure 1. Authors MZ, UB, and TW screened the titles and abstracts of all records retrieved; studies that provisionally met eligibility criteria were assessed for eligibility by examining the full text. Study eligibility was determined in a joint discussion of authors MZ, UB, and TW. Agreement between reviewers was accomplished in a joint discussion. There were no studies where the decision for inclusion/exclusion was a matter of ambiguity (see Supplementary Table 1).

Post-hoc study identification
An exploratory post-hoc literature search was performed on March 10 th 2018 to account for the fact that considerable time had passed between the initial study search and the completion of the meta-analysis. We searched pubmed and Thomson Reuters Web of Science from the beginning of 2015 to the present day using the following (extended) search terms: •  [4][5][6][7][8][9] (with a total N of 196) were published after the initial study search in 2015 and therefore missed by the present meta-analysis (Supplementary Table 1).

Risk-of-bias-assessment
Risk-of-bias identification procedures were re-applied analogue to Ref. 1 (Supplement), with the difference that we assessed the risk of bias regarding voxel-wise whole-brain activity. Note that most risks of bias apply to both meta-analyses, regardless of the target outcome, therefore the assessment below largely is a replication of our earlier assessment; conclusions in risk of bias were largely identical in respect to performance bias, detection bias, and study reporting bias. Author MZ evaluated each study with respect to selection bias, performance bias, attrition bias, detection bias, report bias, and biases introduced by the use of within-subject designs (sequence effects) using to the Cochrane risk of bias tool 10 . All judgments were based on single-subject data, information taken from the published manuscripts, or personal communication with the study authors, following this order of priority.

Selection bias
Non-random sampling and group allocation of research participants can be a considerable source of bias. While, the issue is of major importance in between group designs, requirements are relaxed in within subject designs, as all participants undergo both treatments 11 . Most studies in our sample followed a within subject design and were therefore judged "low risk of selection bias" (Supplementary Table 3). In summary, selection bias due to nonrandom allocation of participants to placebo/control conditions was judged as low in most studies (Supplementary Table 3).

Performance bias
Awareness of the allocated experimental condition by participants and study personnel is considered the major source of performance bias in clinical trials 10 . However, the issue of blinding in experimental placebo research is controversial: The knowledge of being treated is considered constitutional for the placebo phenomenon 12 . Further, the treatment provider and her behaviour are seen as major factors driving the placebo effect 13 . Placebo studies with blinded study participants or treatment providers 12 may underestimate the placebo effects typical for clinical settings. On the other hand, the fact that full blinding is conceptually difficult in experimental placebo studies does not imply that performance bias is not a problem 12 . The lack of blinding in placebo studies makes it difficult to discern "true" placebo effects, i.e. perceived and actual symptom improvements, from "false" placebo effects, i.e. apparent improvements due to demand characteristics / altered reporting behaviour 12 . Thus, so-called "demand characteristics" (participant's tendency to report what they believe they should report, independent of experience) and other biases in judgement and decision making can influence behavioural placebo effects, which is a major reason to also examine physiological outcomes.
No studies in our sample blinded participants or experimenters, with the exception of one between-group study that blinded subjects in respect to group allocation 14 . Therefore, we concluded high risk of performance bias for the present meta-analysis, as voxel-wise brain activity related to demand characteristics cannot be discerned from brain activity related to placebo analgesia with certainty.

Detection bias
It is a common problem in neuroimaging research that image pre-processing pipelines and statistical analysis involve numerous analysis choices. These do not only tempt analysts to cherry-pick favourable results, but also state a multiple comparison problem 15 . Blinding of analysts to the nature of experimental conditions and prespecification of analysis parameters could exclude this type of bias. No included study reported analyst blinding (Supplementary Table 3). Moreover, the pre-processing pipelines and 1 st -level models of imaging analyses varied considerably (Supplementary Table 5). Since our meta-analysis relies on the original first-level analyses, choices by the original analysts may affect results of the present metaanalysis. Analysis pipelines may have been chosen so as to favour some brain regions over others. We therefore judged the risk for detection bias as high.

Attrition bias
Study drop-out and exclusion of participants may systematically affect study outcomes, especially when one experimental condition is affected more than another, or when participants are selected based on outcomes. Supplementary Table 7 provides a general overview on the amount of missing imaging data in respect to different experimental stages of the original studies. For one study 16 insufficient information was available to determine these figures. For the remaining studies, we found that our meta-analysis included 84% of participants included in the original studies, 95% of participants successfully completing fMRI testing, and 99% of subjects included in the original analysis. Main reasons for the discrepancy between participants tested and participants completing measurements were problems with neuroimaging and pain stimulation equipment, which are unlikely to affect our outcome systematically. Main reasons for the discrepancy between participants completing measurements and participants analysed in the original studies were exclusions due to imaging artefacts and due to excessive head movements, which are also unlikely to affect placebo effects systematically. Data from 6 out of 16 subjects for one 17 and 2 out of 19 subjects for another study 18 were unavailable doe to failure of datastorage. Given the relatively low attrition rate and the fact that most studies are within-subject studies, where missing participants affect all experimental conditions alike, we conclude that attrition bias is unlikely to affect the outcomes of our meta-analysis.

Study reporting bias
The underreporting of studies with non-significant ("negative") results is a prevailing problem in biomedical research 19 that has been suggested to affect experimental placebo research 12 . Underreporting of studies with nonsignificant behavioural placebo effects may inflate the effect sizes of our current meta analysis, by biasing the study sample towards placebo responders. Further, imaging studies yielding no activation clusters or clusters in unorthodox regions may have been underreported, although we are not aware of such a case. Based on these results we conclude that there the risk of report bias was unknown for the present analysis. Of note, the present study is based on single-subject whole-brain summary images, as obtained in the original analyses. The non-reporting of peak activations is therefore not a problem and consequently the risk of reporting bias of the present analysis is lower than in previous peak-based meta-analysis approaches (e.g. Ref. 2 ).
Other biases: unbalanced testing sequence in within-subject designs Sequence effects (e.g. habituation or sensitization) may confound treatment-effects in within-subject designs when the order of experimental conditions is not balanced or randomized. An overview on the sequence of treatment conditions in within-subject studies is provided in Supplementary Table 3. Single-subject data on the sequence of conditions was available for all but three studies, two studies reported balanced testing 20 , only for one study no information about testing sequence was available 21 . Several studies tested placebo and control conditions in an alternating fashion, reducing the risk of sequence confound 16,17,[22][23][24] . Two studies tested placebo and control conditions in a fixed pre-placebo (control) vs. post-placebo sequence 18,25 . These studies were excluded from conservative analysis. All remaining studies had balanced designs in respect to the sequence of placebo and control. Overall, sample imbalance for studies was low: placebo conditions were tested after control conditions in 54% of participants. Based on these figures we judged the overall risk of bias due to unbalanced sequence of testing as low.

Risk-of-bias summary
In summary, we concluded high risk of bias for voxel-wise brain activity. Main reason for this decision was the unresolved issues of distinguishing real placebo analgesia from report bias and the risk that detection bias due to non-blinding of analysts affected results.

A note on external validity
The Cochrane risk-of bias tool focusses on the assessment of internal study validity. Beyond this tool, we identified an issue of external validity, that may affect the conclusions of the present meta-analysis. Two studies 20,26 (accounting for 20.7% of the total sample, see Supplementary Table 3) pre-selected placeboresponders. This practice constitutes no bias in terms of internal validity and merely limits the generalizability of results. Mixing studies with and without responder-selection in a meta-analysis may entail an over-representation of placebo responders and therefore inflate our effect size estimates.

General
The present analysis was not pre-registered, yet performed corresponding to the analysis plan for Zunhammer et al. (2018) 5 (see https://osf.io/n9mb3/), with the difference that single-voxel brain responses were the main outcome, not NPS-responses. Of note, statistical thresholds, were not pre-defined in the original analysis plan. Therefore we provide maps for several established thresholding methods, i.e. uncorrected at p < .001 (parametric p-values), family-wise error (FWER) corrected at p < .05 (non-parametric permutation-based p-values), with and without probabilistic threshold-free cluster enhancement 6 ). All analyses were performed with MATLAB (v 2016b). All images were re-sliced to a voxel size of 2*2*2 mm using SPM 12's imgcalc function before further analysis. The meta-analysis was based on the algorithms used in Cochrane's RevMan 5 28 , implement as custom MATLAB functions. The functions and the complete analysis are available at: www.github.com/mzunhammer/PlaceboMetaAnalysis.

Brain coverage
Binary signal/no-signal masks were created for each subject. The resulting voxel-coverage maps were summarized within and across studies to determine the available sample size/missing data at each brain voxel). Brain-voxels which represented less than four participants were excluded at study-level. Subsequently brainvoxels missing in > 10% of participants (total sample) were excluded from further analysis to keep the samplesize comparable across the brain. The decision to exclude such voxels was not pre-established before analysis. The coverage map for the full sample are shown in Supplementary Figure 2. The study-level coverage after excluding missing voxels is shown in Supplementary Figure 3 Image alignment We checked alignment to Montreal Neurological Institute (MNI)-space and image coverage by visually comparing binary signal/no-signal masks and study summaries for pain > baseline against the standard MNI template supplied with SPM (avg152T1.nii). All studies showed satisfactory alignment with the template upon visual inspection, with no single-participant outliers.

Quality control of image signal
Correct data labelling was ascertained in correspondence with the original authors. Outlier screening for excessive random error in imaging signal was guided by the assumption that imaging and statistical artefacts should mainly affect the absolute and relative signal intensities of grey matter, white matter, csf, and extracerebral signal. Raw and absolute parameter estimates for each tissue were obtained by calculating the dot-product of each individual image with SPM8's tissue probability maps grey.nii, white.nii, csf.nii and (inverted) brainmask.nii. Mahalanobis distance and scatterplots were then used to identify suspect cases on a within-study basis. Further, the design matrices (SPM.mat, design.mat) used for first level analysis in the original analyses were evaluated for irregularities, if available. Outlier screening identified 63 cases showing unusual absolute and/or relative activity in white matter, grey matter, CSF, or extra-cerebral space. These suspect images underwent further evaluation using histograms and visual examination. In total, 12 subjects were confirmed as outliers, showing radio-frequency-, magneticsusceptibility-, or spike-like-artifacts (6), extreme values (4), or evidence for errors in the original design matrices (SPM files) (2). Outliers were retained in full, but excluded from the conservative analysis.

Smoothing
The statistical summary images collected differed in terms of image smoothness (see Supplementary Table 5). Between-study Differences in smoothing kernel may impact negatively on the comparability of single studies and the statistical weight of individual studies to the meta-analysis. However, no measures were taken to equalize image smoothness before meta-analysis based on the following considerations: • The main purpose of equalizing image smoothness is to achieve a better comparability of studies. 8 However, the present study primarily aimed at was to summarize brain activity across studies, not to make comparisons between individual studies.
• "One disadvantage of post hoc smoothness equalization is that it requires that all scanners be smoothed to that of the most smooth scanner in the set" 8 . Equalizing smoothing would entail a loss in statistical power and mapping-accuracy.

9/53
Presentation of pain vs baseline contrast For the pain vs baseline comparison we pooled placebo and control conditions based on four considerations: 1. For some studies 29 only pooled estimates of the main effect of pain were available, the map based on "control images only" would not show the complete sample. 2. The pooled map that is optimal for comparing the pain and the placebo contrasts, as the two contrasts are orthogonal 30 . Comparisons based on the baseline-contrast, only would be bias comparisons, as it would reflect peculiarities of the control condition. 3. Pooling reduces within-subject variance and therefore robustness of results 4. The range of effect sizes observed for the pain vs baseline comparison was about 7 times greater than that observed for the placebo vs control comparison, so placebo-related effects do not affect the visualization of the pain vs baseline comparison at large.

Meta-analysis
For outcome comparisons within studies we used Hedges' g, which is the (mean difference / standard deviation (SD))*J, where J is a correction factor for small sample bias (J = 1 -3/(4*df -1)) 10 . For within-subject studies Hedges' grm was used, which is defined as: mean within-subject difference / SDdiff * sqrt(2.*(1-r))*J, where SDdiff is the SD of within-subject differences and r is the correlation between repeated measures 31,32 . For three studies (Supplementary Figure 6), imaging data were only available as separate contrasts for pain activation and placebo conditions. For these studies no within-subject correlation r of pain-related activity under placebo-and control-conditions could be computed. For these studies Hedges' grm was obtained by imputing the mean within-subject correlation observed across all other within-subject studies. Treatment effects and correlations between cerebral treatment effects and ratings were summarized across studies using the generic inversevariance (GIV) weighting method with DerSimonian and Laird random effects 28,32 Fisher's Z-transformation was applied before and after summarizing correlations 32 . Significance thresholds ( < .05) and p-values correct for multiple comparison at family wise error level (pFWER), were obtained by performing a non-parametric, Monte-Carlo (2000 re-samples) permutation-test based on the maximum z-score, corresponding to the maximum-t approach described by Nichols and Holmes (2002) 33 . To determine significance thresholds ( < .05) and p-values corrected multiple comparison for the between-study heterogeneity estimates, the same permutation approach was applied to the maximum-Q ( 2 ) statistic 10 .

Labelling of outcome clusters
The fsl (version 5.0.10) function "cluster", as implemented in the atlasquery automation script (autoaq), was used to label thresholded summary images, automatically. The Harvard Cortical and Subcorical Atlases 34 , the Oxford Thalamic Connectivity Atlas 35 , the Probabilistic Cerebellar Atlas 36 , and the Talairach Daemon (TD) Atlas 37 were used in this order of preference (as provided in fsl 5.0.10). Labels with a probability < .1 were omitted. White matter labels were omitted for brevity, except when no non-white matter label with a probability > .1 was available (low tissue probability implies white-matter).

Responder analysis
We initially planned another analysis including only participants showing an above-median behavioural placebo response for each study ("responder analysis", see https://osf.io/n9mb3/). However, we've replaced this analysis with the correlation analysis of behavioural and cerebral placebo responses, as the dichotomization of continuous outcomes is suboptimal in terms of statistical power and can yield misleading results 38 .

Supplementary Figures
Supplementary Figure 1: CONSORT flowchart of data-acquisition * IPD for all eligible studies were sought. ** All available studies were analyzed.

Supplementary Figure 2: brain-coverage by number of subjects
Number of participants with non-missing data (full sample), voxel-wise, projected onto the MNI152 brain template. Areas with more than 10% missing subjects were excluded from further analysis. The full sample analysis was based on 191118 brainvoxels (2*2*2 mm). n = 543 to 603 individuals from 17 to 20 independent studies per voxel. Source data (results as 3d-volumes) are provided at https://osf.io/n9mb3/.

Supplementary Figure 4: placebo induced changes in pain-related activity (conservative sample)
Standardized effect size g for the contrast painplacebo paincontrol. Sagittal sections cut the hemisphere proximal to the viewer. 14/53

A B
By-subject correlation between behavioral placebo analgesia (paincontrol painplacebo) and placebo-related activity changes (painplacebo paincontrol). Conservative sample excluding between-group studies (individual estimates of behavioral placebo analgesia not possible), high risk-of-bias studies and outlier subjects. Sagittal sections cut the brain hemispheres proximal to the viewer. for the contrast pain stimulation > baseline (pooled across placebo and control conditions). B regions of statistically significant between study-heterogeneity (permutation test, controlled for FWER, one-sided p < .05). Scale: n = 543 to 603 individuals from 17 to 20 independent studies per voxel. Source data (results as 3d-volumes) are provided at https://osf.io/n9mb3/.

Supplementary Tables
Supplementary

39/53
Supplementary  All studies obtained blood-oxygenation-dependent (BOLD) signal using echo-planar imaging (EPI) variants, except for * who obtained arterial spin labeling (ASL). For spatial smoothing a gaussian kernel filter was used in all studies, full-width-half-maximum kernel is provided in mm. All information was obtained from the original publications and (where available) from analysis files (e.g. SPM.mat or design.mat). Source data (results as 3d-volumes) are provided at https://osf.io/n9mb3/. Abbreviations: A Sub-study 1; B Sub-study 2; CRF, cerebrospinal fluid; hrf, hemodynamic response function; ICA, independent-component analysis used for temporal noise filtering; LP, temporal low pass filter; NA, not applicable; TD, temporal derivative; TE, echo time; temo, temporal; TR, repetition time; WM, white matte r.

42/53
Supplementary For studies marked with an asterisk (*) imaging data were only available as separate contrasts for pain activation and placebo conditions, which could not be re-combined post-hoc. Consequently the within-subject correlations necessary to estimate Hedges' grm could not be obtained. We therefore imputed the mean correlation observed across all other within-subject studies in these cases.

43/53
Supplementary  Significant clusters of correlation between brain activity (painplacebopaincontrol) and placebo analgesia (paincontrolpainplacebo) at a threshold of pFWER < .05, corrected for multiple comparisons.

48/53
Supplementary Significant clusters of activation and de-activation for the contrast painplacebolpaincontrol at a threshold of pFWER < .05, corrected for multiple comparisons. Note that fixed effects analysis does not account for between study differences in effect sizes. Cluster labels are provided with probability estimates from the Harvard (Sub-)Cortical (unmarked), Probabilistic Cerebellar (º) or Talairach (*) atlas. [Square brackets] denote comments. "Size" denotes cluster size in voxels of 2*2*2 mm, all other parameters refer to the peak voxel. Source data (results as 3d-volumes) are provided at https://osf.io/n9mb3/. Abbreviations: g, gyrus; hem, hemisphere; L, left; perm, permutation test; PL, posterior lobe; post, posterior; R, right; sup, superior; TOFC, Temporal Occipital Fusiform Cortex; WM, white matter.