Mapping brain structural differences and neuroreceptor correlates in Parkinson's disease visual hallucinations: a mega-analysis

Parkinson's psychosis (PDP) describes a spectrum of symptoms that may arise in Parkinson's disease (PD) including visual hallucinations (VH). Imaging studies investigating the neural correlates of PDP have been inconsistent in their findings, due to differences in study design and limitations of scale. Here we use empirical Bayes harmonisation to pool together structural imaging data from multiple research groups into a large-scale mega-analysis, allowing us to apply new methodological approaches to identify cortical regions and networks involved in VH and their relation to receptor binding. Differences of cortical thickness and surface area show a wider cortical involvement underlying VH than previously recognised, including primary visual cortex and its surrounds, and the hippocampus, independent of its role in cognitive decline. Structural covariance analyses point to a strong involvement of the attentional control networks in PD-VH, while associations with receptor density maps suggest neurotransmitter loss may drive the cortical changes.


Introduction
Parkinson's disease (PD) is a neurodegenerative disorder primarily characterised by motor symptoms, mainly related to the loss of neurons in the substantia nigra projecting to the basal ganglia (Dickson et al., 2009). Patients with PD commonly experience a variety of non-motor symptoms, including psychiatric ones (Schapira et al., 2017). Among these, visual hallucinations (VH) and related visual phenomena form a spectrum of symptoms referred to as Parkinson's psychosis (Ravina et al., 2007) There is a continuum of experiences typically characterising PDP with patients initially experiencing minor hallucinations (perception of presence or passage) and illusions that progress to formed hallucinations Imaging studies of VH in PD to date have been based on relatively small samples and have used differing designs that variously control for the degree of cognitive decline, stage of PD and dopamine medication. This makes it difficult to disentangle brain changes related specifically to VH mechanisms as distinct from those related to cognitive decline, PD stage or medication effects. As a result, a heterogeneous array of structural differences has been reported. Depending on whether or not cognition is controlled for, this view. Current consensus is that dopaminergic medication interacts with disease-related susceptibility factors in PD to cause VH, rather than as a simple side effect (Ravina et al., 2007). Cholinergic pathways have also been implicated in VH (e.g. Janzen et al., 2012;Collerton et al., 2005), with neurodegeneration in brainstem and forebrain cholinergic nuclei (Janzen et al., 2012)  In summary, our mega-analysis of PD with VH compared to PD without VH enables analyses that are not available to smaller scale studies to help explore the mechanisms of VH. Specifically, we are able to determine the regional cortical thickness and surface area changes associated with VH and relate these morphometric features to measures of symptom severity in a subgroup where finer-grain clinical detail is available. We perform a principal component analysis to identify smaller-scale morphometric differences within a high dimensional set of regions. In addition, we perform an exploratory structural network analysis to highlight associations between regions and clusters of connections linked to VH. Structural covariance allows us to assay covariation of differences in grey matter morphology between different brain structures, providing information on which regions similarly change in thickness or surface area. In order to understand the neurochemical associations of these changes, we also test the hypothesis that structural differences are related to the spatial variation in subtypes of receptors for which high resolution PET atlases are available (dopamine and serotonin).

2.1
Patient characteristics demonstrated we have good matching on the criteria, the mega-analysis ANOVA shows that there is a difference of 2. 19 (Baig et al., 2015), T1 data (Griffanti et al., 2020) separately published, but not in a publication studying them together.

Hallucinators (PD-VH) vs. non-hallucinators (PD-noVH) multivariate analysis of variance.
Cortical Thickness. Lower thickness in PD-VH was present in a widespread set of regions (see Figure 1 and S3 Surface area. We found reduced area in PD-VH mainly in frontal and occipital regions (see Figure   1) (for all tables and details see S3 The same models were carried out also for subcortical volumes not yielding significant results for all receptors (see S6).

Principal components analysis (PCA).
We performed PCA to reduce the dimensionality of the dataset while preserving variability, to identify underlying clusters to clarify the results from the group-level analyses. For Dimension 2, the contributing regions were the left central insular area, the anterior and superior portions of the circular sulcus of the insula (Figure 3.b, S7).
For cortical thickness only, we found a significant inverse correlation of Dimension 1 individual contributions and NPI score (r = .-138, p = .049). In addition, the thickness for the Dimension 1 regions (left SFG, MFG, precentral) negatively correlated with NPI score (r =-.15, p = .046, one-tailed Pearson correlation), with the individuals having higher pathological score having also the lower thickness in these regions. No significant results were found for surface area, as in the NPI correlational analyses previously reported.

Structural covariance analysis.
To explore and characterise the gray matter network-level organisation of PD-VH and PD-noVH patients for cortical thickness and surface area we carried out structural covariance analyses, that assess the covariation of differences in grey matter morphology between different brain structures. After specifying a general linear model for each region, the structural covariance matrices (68x68) of the two groups were defined by estimating the inter-regional correlation between model residuals of thickness and area (in separate models).
Cortical thickness. Significant difference of the two covariance matrices (PD-VH, PD-noVH) was first tested (χ 2 =3010.82, df = 2278, z of differences = 3.83). The cell-by-cell comparisons of residuals' inter-regional correlation coefficients highlighted differences in the interregional covariance, in particular in the left inferior temporal gyrus, supramarginal gyrus and inferior parietal lobe (IPL), superior frontal (SFG) and inferior frontal gyrus (IFG) pars opercularis and the fusiform and lateral occipital gyri on the right (Figure 4). Overall, inter-regional correlations were greater for the PD-VH group (but see also S8).
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; Figure 4. Regions with the most significant difference in inter-regional correlations of cortical thickness between groups: the inter-regional correlations for these regions were significantly greater for VH patients. Shown in the circular plot, only the inter-regional correlations with a difference greater than 0.3 in the r 2 (for details and z scores on all differences, see S8). Legend Hubs, that is nodes (here regions) that are thought to strongly contribute to the global network function, were identified in frontal, parietal and occipital regions for the PD-noVH group, and in frontal, temporal and parietal regions for the PD-VH group (Figure 5a). Permutation tests for vertex-level measures returned differences in betweenness centrality, which was greater in PD-VH in the left and right lingual gyrus, in the left lateral occipital gyrus and the right SPL (p FDR <.05). Communities are sets of brain regions characterised by denser and stronger relations among themselves, if compared with regions of other communities. Structural covariance-based communities have been found to replicate neighbourhoods observed with seed-based approaches in fMRI and DTI (see Methods for details). The first community in the PD-VH group comprised mainly occipitotemporal regions, with the second involving parietal and some frontal regions. In the PD-noVH group, the first community consisted of mostly frontoparietal regions whereas the second comprised occipitoparietal regions (Figure 5b). In addition, the PD-noVH group All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in showed higher modularity, as assessed with bootstrapping (mean = 0.29 SD = 0.02, CI 0.25, 0.36 at density 13%) (for communities by lobe, see S8). perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Finally, we found no significant correlation between difference of the means in thickness and difference of the means in graph-level measures.
Surface area. As for thickness, the two covariance matrices were different (χ 2 = 5347.2 , df = 2278, z of differences = 6.8). In addition, among the others, significant differences in interregional covariance were observed bilaterally in the rostral MFG, STS, fusiform gyrus, and IPL; in the left caudal MFG, lateral occipital gyrus, SPL, and insula and in the right anterior and posterior cingulate, and IFG pars opercularis, with a pattern very similar to the one observed for thickness (see Figure 6 and S8). most significant difference in inter-regional correlations of surface area between the groups: these correlations were significantly greater for PD-VH. Only the inter-regional correlations with a difference greater than 0.3 in the r 2 are shown (for details see S8). Legend  Finally, we found a significant positive correlation between difference of the means in the NPI subsample and with the difference in local efficiency (r = .24, p = 0.02), whereby the greater the difference in the surface area, the greater the difference in the local efficiency. The regions with both the greatest area All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; differences and efficiency differences were the bilateral lingual gyrus, lateral occipital gyrus, right cuneus and right insula.

Discussion
We have presented a mega-analysis of patients with Parkinson's disease with and without visual hallucinations, demonstrating widespread alterations in brain structure, with differential effects for cortical thickness and surface area and examined their relationship to receptor distributions and network-level effects. Below we discuss the implications of the findings and their relationship to current theories of VH.

Cortical thickness and surface area
Cortical thickness and surface area (SA) are considered two separate components in ageing and disease However, not all regions are equally affected and, notably, there appears to be a posterior asymmetry with relative sparing of the left ventral visual stream (ventral occipito-temporal cortex) compared to the homologous region in the right hemisphere. This region plays a key role in all models of VH in PD but a greater involvement of the right hemisphere has not been noted previously. The PCA analysis helped define key sub-regions within the extensive areas of cortical thinning that contributed most to the group difference, identifying a frontal and an occipital dimension. Of these, the cuneus bilaterally and left dorso-medial aspect of the superior frontal gyrus emerged as the dominant components. These regions have been reported in previous studies but do not play a prominent role in accounts of VH in PD. The cuneus is one of the earliest All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; regions to show cortical atrophy in PDP, present at the earliest stages when only minor hallucinations occur (Pagonabarraga et al., 2014), while cortical thinning in the dorso-medial superior frontal gyrus has been reported in patients, months to years prior to the development of VH (ffytche et al., 2017). It may be that the prominence of these regions in the mega-analysis relates to the longer duration of these changes compared to other brain regions resulting in a greater consistency of thickness reduction between patients.
For SA, the differences between groups were more circumscribed with bilateral medial occipital SA reduction for patients with VH in a region corresponding to the primary visual cortex and its surrounds (striate and extra-striate cortex) and the left insula. This is the first-time such extensive structural changes have been identified in the primary visual cortex and its surrounds in PD patients with VH and helps account for wide-ranging low-level visual deficits found (see Weil et al., 2016 for a review). These regions also have reduced cortical thickness but their prominence in the SA analysis may imply additional gyral atrophy, sulcal widening and a reduction of underlying white matter.
The mega-analysis also allowed us to move beyond a binary comparison of VH versus noVH to examine brain regions linked to VH severity as measured by the NPI hallucination subscale score (a composite score derived from the product of frequency and distress ratings) and taking into account any variability associated to age, gender, TIV, medication, cognition, disease onset and PD severity. Regions with reduced thickness for higher severity scores (negative correlation) were found in posterior parietal, posterior cingulate and superior temporal cortex while a region in the frontal lobe was found with greater thickness for higher severity scores. Previous studies have associated these regions with mental rotation and , thus one can infer an involvement of these processes and these regions in VH severity. In addition, the IPS is also part of the dorsal attentional network, previously implicated in VH in PD (Shine et al., 2014a; see discussion below). These regions were also identified as hubs in the structural covariance analysis, discussed further below. As separate distress and frequency scores were available only for a part of the subsample, we were unable to analyse the two components of the severity score separately, thus we cannot disentangle whether these correlations are All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; driven primarily by the frequency or distress measures. However, this is the first time a link between cortical structural changes and phenomenological aspects of VH severity has been identified.

Subcortical regions, hippocampus and cerebellum
In addition to the detailed analysis of the cerebral cortex we were able to examine the volumes of subcortical structures as well. Bilateral volume reduction was found in the amygdalae. Lewy bodies have been found in the basolateral nucleus of the amygdala associated with VH in PD patients at a similar level of cognitive impairment to those studied here (Harding et al., 2002) that may account for this finding. Unlike the amygdala, there are only sparse Lewy bodies in the hippocampus at this disease stage and volume changes in this structure are more difficult to interpret. Since the prevalence of VH increases as PD progresses, tracking cognitive progression from PD-MCI to PD-dementia, it is difficult to disentangle brain changes related primarily to cognitive decline from those related primarily to VH or that may contribute equally to both. Reductions of hippocampal volume (particularly its anterior portions) have been found in some, but not all, studies of VH in PD, depending on whether patients are matched for cognitive decline (e.g. Yao et al., 2016; Ibarrexete-Bilbao et al., 2008). Here we found smaller left (and a trend for right) hippocampus in the NPI sample where we were able to covary for age, gender, TIV, onset, LED, PD severity and cognition.
We did not find hippocampal volume reduction in the full data set covarying only for age, gender and TIV.
The volume reductions in the NPI analysis cannot be explained by differences in cognition or PD progression between groups, confirming a role for the hippocampus in the mechanism of VH that is independent of cognition (e.g. Ibarrexete-Bilbao et al., 2008), thus highlighting the need to carefully design studies and control for cognitive and disease factors when examining hippocampal contributions to VH. The thalamus has been suggested as a key hub linking several cortical networks associated with VH in PD (Weil et al., 2019). We did not find altered thalamic volumes in PD-VH in the main analyses or subgroup NPI analysis which included a wider range of covariates (S4). This does not rule out involvement of the thalamus in the pathophysiology of VH in PD but does suggest any functional changes in this structure are not associated with volume loss. Finally, reduced volume in cerebellar lobules VIII, IX/VII and Crus 1 is associated with VH in PD (Lawn and ffytche, 2020). Freesurfer does not segment specific cerebellar subfields but volume changes were found in cerebellar white matter that may relate to these cerebellar cortical changes (Lawn and ffytche, 2020). All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

Neurotransmitter receptor density and structural imaging changes
There is only sparse Lewy body pathology in the cortex of PD patients with VH at the disease stage included in our analysis (Harding et al., 2002), raising the question of what causes the extensive cortical changes found in this and previous studies. One possibility is that such cortical changes represent synaptic loss with volume loss and the cortical distribution of these neurotransmitter systems has yet to be examined. We were able to investigate this relationship for subtypes of dopamine and serotonin receptors for which high resolution maps are available and found that cortical regions with higher binding had increased cortical volume loss. The association, in particular for 5-HT2A, was confined to regions linked to VH rather than the cortex as a whole, suggesting the neurotransmitter effects were specific to VH, consistent with the possibility that degeneration in these neurotransmitter systems in PD underlies synaptic loss and cortical thinning. Receptor binding maps for D2/D3 and 5-HT2A were not correlated suggesting different cortical regions contributed to the associations found for D2/D3 and 5-HT2A. 5-HT2A and 5-HT1A binding maps were correlated so the same cortical regions are likely to have contributed to both serotonin findings. While increased binding was associated with increased thickness loss, the opposite association was found for SA, with higher binding exhibiting less SA reduction. This finding was not specific to VH regions for dopamine and 5-HT1A so may reflect a different process to the thickness alterations found. It is also unclear what causes low binding regions to be associated with increased loss of SA. Finally, there was no suggestion of a greater contribution of one neurotransmitter system over the other to cortical thickness loss, with equivalent slopes for all three receptors maps examined.

Structural covariance
The examination of inter-regional correlations, with areas sharing reductions in thickness or SA considered part of a functionally connected network, showed that regions of greater inter-regional thickness , and the inter-regional covariance findings support this view. In contrast, the inter-regional SA covariance findings highlight key DMN regions in medial frontal and posterior cingulate cortex. These regions were not found to have reduced SA in PD-VH, suggesting a relative preservation of the DMN compared to VAN and DAN. Indeed, results from dynamic fMRI have indicated active coupling between the DMN and the visual network, which correlated with the frequency of misperceptions, as opposed to reduced connectivity between the DMN, VAN and DAN . Other metrics derived from the covariance structure include hubs defined by the richness of their interconnections and communities defined by their local strength of covariance. Hub metrics for thickness in the occipital lobe and parietal lobe were stronger in patients with VH, suggesting cortical thinning has a wider impact on the network in these patients, highlighting the importance of functional alterations in early visual areas in VH. One could argue that VH may not only depend upon on areas presenting neural pathology, but also on areas that may be relatively unaffected but operate in a network where there is pathology elsewhere, thus becoming functionally pathological while structurally intact (ffytche, 2008). Indeed, all the regions where richness of connections was either lower or higher for PD-VH fell outside areas of reduced SA in VH, suggestive of a more functional pathology which needs to be further explored with functional connectivity. Finally, there were qualitative differences in the communities of highly associated regions for PD-VH compared to PD-noVH in both the thickness and SA analysis. Of particular note was the extent of interconnected areas in the ventral, lateral and medial temporal lobe that was larger in the PD-VH group. These regions had reduced thickness in PD-VH implying the local extent of thickness reduction is greater in PD-VH.

Strengths and limitations
This is the first mega-analysis of VH in PD, pooling data to create the largest sample of PD patients with and without VH tested to date. While this is a major strength of the study, it also introduces complexities that smaller studies do not have to address. One is the variability of clinical data available for each site, limiting the analyses we could perform with the full dataset of 493 participants. This means that some of the key analyses, for example those related to cognitive covariates and disease duration or symptom scores, All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; could only be carried out in a smaller sample of 146 participants, but this is still substantially larger than any previous study. Another complexity is the need to address variance in the data caused by scanning at different sites and scanner types. Previous studies have typically used voxel-based methods to examine structural differences between PD-VH and PD-noVH. We used a different method to allow us to harmonise data between sites and examine cortical thickness and SA separately, but this means our findings are not directly comparable to those of previous studies. The primary focus of the study is on the cerebral cortex so we have not attempted to examine the detailed anatomy of regions such as the basal ganglia, hippocampus, cerebellum and thalamus that may have a role in VH. Finally, we do not have access to high resolution density maps for cholinergic receptor subtypes which limits the range of neurotransmitter analyses we can perform.

Conclusions
The mega-analysis has allowed us to resolve several uncertainties in the previous literature and describe new features of the VH phenotype in PD. With a sufficiently large sample, more widely distributed cortical involvement emerges than previously suspected with the important novel finding of involvement of the primary visual cortex and its surrounds. Structural covariance modelling has helped dissect out networks linked to attentional control within the widespread cortical regions affected, adding further evidence for the role of these networks in PD-VH. The findings also help resolve ambiguities between structural correlates of general cognitive decline or PD progression and those specifically related to VH. Patients at the same stage of PD and general cognitive impairment who experience VH have lower hippocampal volumes than those who do not. The hippocampus does not currently play a central role in models of VH in PD and our findings suggest this needs to be reconsidered. We can argue that the hippocampus represents part of an extended DMN composed of functional hubs, a dorsal medial subsystem and a medial temporal subsystem, which includes the hippocampus (Qi et al., 2018). Thus, structural covariance, graph-level analyses and structural hippocampal imaging point to the involvement of attentional control networks in PD-VH. Finally, the findings shed light on why widespread cortical changes occur at a stage of PD with only sparse cortical neuropathology. The associations between dopaminergic and serotonergic receptor binding and cortical thickness provide the first evidence that the cortical changes may be driven by neurotransmitter reductions, raising the possibility of novel interventions to mitigate these effects at an earlier stage of disease.
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in MRI data was pre-processed with Freesurfer 6.0.0 (Fischl, 2012;Dale et al., 1999) to estimate cortical thickness, surface area and subcortical volumes. Data was processed on King's College London HPC infrastructure Rosalind (https://rosalind.kcl.ac.uk), with the standard recon-all procedure, consisting of motion correction, skull-stripping, affine registration to Talairach atlas, segmentation, smoothing, and parcellation mapping. In order to screen for possible errors in the segmentation process, mean cortical thickness measures and manual slice by slice inspection were used to identify possible errors in the whitegrey matter boundary and pial reconstruction steps. For subjects that did not segment properly the failed processing steps were re-run (autorecon3) after performing the appropriate corrections. Low quality scans (e.g. with excessive motion, n= 4) or scans that did not segment well upon troubleshooting (n =2) were discarded. Individual cortical thickness, subcortical volumes and surface area measures were extracted based on the Destrieux atlas (Destrieux et al., 2010). In order to explore structural differences between patients with and without VH across the different cohorts minimising variance due to different recruitment sites and, therefore, different scanners, we used a harmonisation method. ComBat is an empirical Bayesian algorithm aiming at minimising the variance due to the scanner features and to maintain the variance related to biological features and has been previously successfully used in studies of cortical thickness (Fortin et al., 2018;Radua et al., 2020). In this study, this method has been also used to harmonise volume and surface area for each participant (see Supplemental information S1 for more details about this method and plotted results). perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02.17.21251558 doi: medRxiv preprint core team, 2017). Results are presented in Figure 1, created with a custom colour coding based on p values and by overlaying region labels on a brain render.

Group differences analysis
We used Tukey's method programmed in R with the 1.5*IQR rule to identify outliers other than those excluded upon unsuccessful pre-processing. This allowed the careful inspection of the identified subjects in order to verify whether the outlier value depended upon measure errors (e.g. harmonisation bugs) or incorrectly entered data, or on the subject, with the purpose of retaining outliers depending on the subject (e.g. intrinsic features of the subject). No participants were discarded upon this check for this analysis.

Sensitivity and Subgroup analysis.
Of the eight original groups, three used the Neuropsychiatric Inventory (NPI) to score visual hallucinations. For this subgroup of studies, patients were matched for age, gender, onset, levodopa equivalent daily dose (LEDD), and Mini Mental State Examination (MMSE) score. Within each of the 3 studies, patients were also matched in terms of motor symptoms severity (UPDRS-III). We also ran a oneway ANOVA to check whether the subsample was matched for UPDRS-III but data was missing for 20 participants. We computed the group mean and used that to fill the missing value for the between groups multivariate ANOVA. We carried out Pearson's product moment correlation coefficient between NPI score and the cortical thickness, surface area and subcortical volume data; we computed the same analysis with LED as a covariate in order to address its potential role in VH severity. In addition, we compared the PD-VH and PD-noVH in the data set using the original VH binary scores to check for consistency in the results with the larger data set, including age, gender, disease onset, LED, PD severity (UPDRS-III) and MMSE as covariates (Supplemental Information S4). We also conducted analyses of variance with a larger subgroup and with graded VH scores (mild, moderate, severe), together with an ordinal logistic regression (for details on the sample, methods and results see Supplemental Information S5).

Receptor density profiles.
Regression models with the difference of the means (VH -noVH) of morphometrical features (thickness, surface area, subcortical volume) as dependent variable and receptors density profiles as predictors were reported in S3 were too small in number to carry out a more powered model. We ran separate models for each receptor and for thickness, surface area and volume. In addition, for each receptor we ran three different models. First, we examined the relationship between the receptor's binding potential in the regions with significant differences in cortical thickness/surface area between PD-VH and PD-noVH. The sloopes for these models were also compared (Supplemental Information S6). Then, in order to better investigate such relationship, we also assessed whether the receptor's binding potential could predict thickness/area values for all regions; finally, with the same purpose, we ran models considering only regions where the difference between the groups was not significant (S6). Linear regression models were coded in R using the packages rstatix (Kassambara, 2020)  was marked as highly influential, explored and if appropriate discarded (Cook and Weisberg, 1982). In addition, the confidence intervals of the significant regression models were estimated with the bootstrapping technique (Efron and Tibishirani, 1986) with 100,000 cycles (S6). Methods and results for thickness and All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Separate PCAs were carried out for thickness and surface area. PCA inputs comprised the significantly different regions from the MANCOVAs (S3). Results are presented in Figure 3, created with a custom colour coding based on the components and by overlaying region labels on a brain render. To further explore a possible relationship of PCA components and hallucination severity, we carried out correlational analyses (Pearson product-moment) between the individual contributions to the different PCA dimensions, the NPI scores in the NPI subsample and the mean thickness and surface area of each dimension/component, that is the mean thickness/area across the regions constituting that component.

Structural covariance and graph theory analysis
In order to investigate inter-regional properties to explore and characterise the gray matter network-level organisation of PD-VH, we built networks based on structural covariance, a technique that assays covariation of differences in grey matter morphology between different brain structures across a specific population (Lerch et al., 2017;2006). Since the most widely used atlas for this kind of analysis is the perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02.17.21251558 doi: medRxiv preprint with the same procedure described in section 4.3.1. The dataset was reduced to 467 cases as the design matrix based on the full dataset was not invertible due to high collinearity of some columns. We discarded N=26 subjects comingfrom the smallest datasets and the problem was overridden. Homogeneity of groups was verified with a Levene's test (Nimon, 2012;Cheung, 2019).
The dataset counts 467 subjects, 118 PD-VH, 349 PD-noVH, with participants being matched for age. Age and gender were used as covariates in the models. Analyses were carried out with R package braingraph (Watson, 2018;Watson et al., 2018) and igraph (Csardi and Nepuz, 2006). To construct the networks, first we specified a general linear model for each region (thickness/area as outcome variable, age and gender as covariates). The structural covariance matrices of the two groups were defined by estimating the inter-regional correlation between model residuals of thickness and area (in separate models) (e.g. He et al., 2007) to build a 68x68 matrix. A density-based threshold (Fornito, Zalesky and Bullmore, 2016) was applied to the matrix in order to retain a percentage of the most positive correlations as non-zero elements in a binary adjacency matrix. Different densities ranging from 0.05 to 0.20 with a 0.01 step size were explored. The differences between PD-VH and PD-noVH covariance matrices were then computed, first to establish that the two matrices differed significantly from one another; secondly, a cell by cell comparison was carried out to establish which covariance patterns were significantly greater for the PD-VH group compared to the PD-noVH group. Random undirected and unweighted graphs were created for each group, and vertex-and graph-level metrics were computed for each group. For visualisation purposes a density of 0.13 was selected. Vertex importance was assessed using degree, betweenness centrality and nodal efficiency. A hub was categorised as such if its betweenness centrality was greater than the mean plus 1 standard deviation -calculated on all vertices at the same density (e.g. Bernhardt  perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Finally, to further assess the relationship between graph level metrics and visual hallucinations in the full sample, we computed Pearson's correlation coefficients between the difference of the means of graph metrics of interest (vulnerability, transitivity, local and nodal efficiency, path length, betweenness centrality, eccentricity, distance) for the models on thickness and surface area separately, and the difference of the means in thickness and in surface area, respectively, with the NPI subsample, for which we have all clinical and demographic information and in which participants are matched on all those variables.

S1. The ComBat harmonisation method and its application on this dataset.
As noted in the Methods section, this algorithm was first developed for genomics analyses requiring to pool together data from different datasets (Johnson, Li and Rabinovic, 2007) and has been recently adapted to other measures, such as cortical thickness (Fortin et al., 2018). ComBat applies empirical Bayes (EB) estimation to account for site-specific variance by using an error with a multiplicative scaling factor that is scanner specific. The model allows to retain biological variance, such as that related to age, gender and disease (in our case, hallucinators and non hallucinators), variables that are entered the model as a vector.
The algorithm is used on segmented and parcellated data, before running any other statistical analysis and has proven extremely reliable for this type of analyses as described in Fortin et al., 2018 and more recently by Radua et al., 2020. Hierarchical Bayes (HB) is also being proposed to harmonise data from different sites (Kia et al., 2020 1 ). However, we chose to use EB and as the purpose of this study was not to assess the suitability of harmonisation models but to build on such harmonisation to develop models for the understanding of the mechanisms underlying visual hallucinations in PD. Secondly, EB is less dependent on the hyper-parameter settings and thus uses data more directly, which for our purposes was the most suitable choice, as HB would possibly be more suitable for approaches whereby a careful specification of hyper-parameters is crucial, as is the case for normative modelling (Kia et al., 2020 1 ). perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ;

S2. Additional information on the full sample.
Meta-analysis to determine whether participants in the full sample (N=493) are matched on the relevant demographic and clinical variables. As we did not have raw data for all groups for all relevant variables, we tested whether our participants were overall matched for age, onset, MMSE, UPDRS and LED. The analysis was carried out with R package 'metafor'. Results show that participants are overall matched for age, PD motor severity as assessed by the UPDRS-III total score, onset and levodopa equivalent daily dose (LED). The model shows a trend towards significant driven by the KCL dataset (N= 15) for the MMSE variable.
Information on values for brain volume, total intracranial volume (TIV), total gray matter volume perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02.17.21251558 doi: medRxiv preprint S4. NPI subgroup custom MANCOVAs with binary score (PD-VH vs. PD-noVH) and correlation analysis (VH only) with LED as covariate.
The NPI subgroup (146 patients, 79 PD-noVH, 67 PD-VH) is the group for which we had the richest clinical dataset, as for most of these patients we have not only age and gender, but also disease onset, levo-dopa equivalent daily dose (LED) and mini mental state examination (MMSE) data, PD severity, together with the continuous NPI score. 126 had UPDRS-III scores. 20 from the same group did not have those scores.
We computed the mean for the group regardless of VH and used that score in our model. We ran a custom multivariate ANOVA model with MMSE, gender, age, TIV, UPDRS-III, LED and onset as covariates, controlling also for the interaction of those covariates that correlated with each other (age * LED + age * MMSE + onset * LED + onset * UPDRS3 +LED * MMSE). Finally, we carried out the same correlational analysis described in the main text, but with LEDD as a covariate, as it might be related to severity of VH, obtaining the same results: negative correlations were found for the right superior temporal sulcus (r = -.26, p = .03), the right inferior parietal sulcus (r = -.24, p = .05), the right Jensen sulcus (r = -.28, p = .02) and the right cingulum marginalis (r = -.24, p = .056), which was a trend in this case. In addition there was the positive correlation reported also in the main text with the right frontomarginal gyrus (r = .26, p = .04).

Demographics
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in As there is not a specific criterion to do so, in order to create an ordinal variable, for patients we had UPDRS scoring we have retained that (0 = no hallucinations or delusions, 1= illusions or non formed hallucinations, 2 = Formed hallucinations independent of environmental stimuli, 3= Formed hallucinations with loss of insight, 4= patient has delusions or paranoia (but no patients had a UPDRS 4 score or a NPI, NEVHI, MIAMI equivalent). We have "translated" the NPI continuous score to such a scale, with scores ranging from 1 to 3 categorised as "minor/mild VH", scores raging from 4 to 8 as "moderate" and scores above 9 as "severe"; for the MIAMI we have used a similar procedure, as the scale ranges from 0 to 14.
Finally, for the NEVHI, we have used the raw data from the interviews to determine whether the person had mild, moderate or severe visual hallucinations.
To isolate regions where there was some difference between the groups in order to create a multiregion ordinal regression model, we ran a multivariate ANOVA for each morphometric measure (subcortical volumes, SA, cortical thickness) was ran with degree of hallucinations as between subjects factor on hallucinators only (McCrum-Gardner, 2008) and with age, onset, gender, TIV as covariates. The lack of significance when considering the comparison with those with a 'severe' score may be due to different reasons, with one being that the different PDP scales used in the different studies have slightly different criteria, but also that patients with higher scores may have delusions as well, which may as well be in the same continuum as visual hallucinations, but this does not necessarily imply a graver atrophy in the same regions underlying VH. Finally, the three groups have very different sample sizes, and this might as well affect the analyses, as one can notice the large variability within each group.
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02.17.21251558 doi: medRxiv preprint S6. Additional models and details about receptor density maps.

a) Additional models for thickness and surface area.
We used bootstrapping to estimate the confidence intervals of the regression models that resulted significant and that are reported in the main text. In addition, since both 5-HT1A (β =.181 , t = 2.07, p = .04) and D2/D3 (β =.277, t = 3.24, p = .001) resulted as significant predictors also when considering all regions, we repeated the same procedure, finding that the bootstrap for the coefficients resulted significant for 5-HT1A, p = .008, CI (0.06, .40) and for D2/D3 p < .001, CI (180.08, 516.77).
For regions where there was a relationship between receptor density and structural morphometrics for all regions, additional analyses were conducted to assess whether this was simply the same effect as the relationship of the regions that differed or a feature of the whole brain by examining the non-significant regions.
Cortical Thickness. The models with 5-HT2A binding potential per region as predictor and the mean difference per region (considering only the regions where no difference between groups was observed) as dependent variable resulted not significant for regions that did not differ (β = -.013 , t=-.09 p=0.93); a similar result was obtained for 5-HT1A (β =.05 , t=.35, p= 0.72) and D2/D3 (β =.24 , t=1.72, p= .09) for the same regions.
Surface Area. The models with 5-HT2A binding potential per region as predictor and the mean difference per region (considering only the regions where no difference between groups was observed) as dependent variable resulted not significant for regions that did not differ (β =-.013 , t = -.067 p = .5); a similar result was found for 5-HT1A (β =-.05 , t = -1.18, p = .3), whereas D2/D3 was a significant predictor also for this model (β =.243, t =2.91, p =.001).
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Slope comparisons. For thickness we compared the slopes for 5-HT2A, 5-HT1A and D2/D3 for the differing regions, finding that the three slopes did not differ (p> .05). For surface area we observed a similar result (p > .05).
All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in  perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in