Dopamine signaling enriched striatal gene set predicts striatal dopamine synthesis and physiological activity in vivo

The polygenic architecture of schizophrenia implicates several molecular pathways involved in synaptic function. However, it is unclear how polygenic risk funnels through these pathways to translate into syndromic illness. Using tensor decomposition, we analyze gene co-expression in the caudate nucleus, hippocampus, and dorsolateral prefrontal cortex of post-mortem brain samples from 358 individuals. We identify a set of genes predominantly expressed in the caudate nucleus and associated with both clinical state and genetic risk for schizophrenia that shows dopaminergic selectivity. A higher polygenic risk score for schizophrenia parsed by this set of genes predicts greater dopamine synthesis in the striatum and greater striatal activation during reward anticipation. These results translate dopamine-linked genetic risk variation into in vivo neurochemical and hemodynamic phenotypes in the striatum that have long been implicated in the pathophysiology of schizophrenia.


Supplementary Methods
Genotype data processing LIBD, NIH and UNIBA cohorts.Participants underwent blood withdrawal for subsequent DNA extraction from peripheral blood mononuclear cells.To this aim, approximately 20ml of fresh blood was obtained through a conventional venous blood collection with 10ml EDTA Vacutainer Venous Blood Collection Glass Tubes (Vacutainer ®).Approximately 200 ng DNA was used for genotyping analysis.DNA was concentrated at 50ng/μl (diluted in 10 mM Tris/1mM EDTA) with a Nanodrop Spectrophotometer (ND-1000).Samples were genotyped using variate Illumina Bead Chips including 510K/610K/660K/2.5M.
Of each pair of related individuals, the one belonging to the group with greater numerosity within each cohort is dropped from the final datasets.
Genotype imputation was performed using the pre-phasing/imputation stepwise approach implemented in IMPUTE2 / SHAPEIT (default parameters and chunk size of 3 Mb for LIBD and UNIBA -250 Kbp for NIH) and using Phase 3 1000 genome as reference panel 7,8 .After imputation, imputed dosage data for each SNP with imputation quality (INFO) > 0.9 were used for polygenic risk scores (PRS) calculation.
For LIBD cohort, the first 10 principal components of the whole genome data were calculated using EIGENSOFT v5.01 (EIGENSOFT, http://www.hsph.harvard.edu/alkes-price/software/)and considered as nuisance covariates in genetic analysis while for UNIBA and NIH cohorts the SNPRelate R package 9 and PLINK (version 2) were respectively used.. KCL cohort.DNA was extracted from whole blood samples or cheek swabs using standard procedures 10 .Genotyping was performed at Cardiff University, using HumanCore Exome 1.1 arrays ("Psych-chip", Illumina, San Diego, California, USA).Genotype quality control (QC) was performed according to standard parameters 11 .PLINK (version 1.9) was used to implement pre-imputation quality control procedures.Individuals were excluded from the sample if estimates of the inbreeding coefficient F indicated ambiguous sex (F = 0.2 -0.8), or if there was a discrepancy between their genotypic and self-reported sex.Any discrepancies were checked with the data collection site to confirm no errors were made during manual entry of phenotypic data.Samples with a genotype call < 95%, or high pairwise relatedness (pi-hat > 0.

SDA run and post-processing
We run SDA 10 times using the following parameters: --N = 238 (number of samples) --max_iter = 3000 (number of iterations) --num_comps = 238 (number of components) --seed = 400 (used for results replication) Running the method multiple times, we obtained some components that are found consistently across multiple runs, whereas other components only occur in a small number of them.To identify robust components, the authors of the paper implemented a method that clusters similar components across different runs.We then focused on large clusters containing components from multiple different runs and used these as the basis for further analyses.More specifically, at the end of each run we stored the individual and tissue scores, gene loadings and Posterior Inclusion Probabilities (PIPs = a ranking measure to assess whether the data favors the inclusion of a variable in the regression model, i.e., the probability that, after the SDA decomposition, each gene with his loading or weight belongs to a specific component).After the 10 runs finished, following what the authors did in the paper, we calculated the absolute correlation between the individual scores for all pairs of components across the ten runs.Hierarchical clustering was then used to group components into clusters, using one minus the absolute correlation as a dissimilarity measure.The clustering was terminated when no correlations between clusters were above 0.4.Although authors of SDA used 0.6 as threshold, we decided to use a more lenient one to maximise the information considering our significantly smaller sample size.The components within each cluster were then combined.We took the mean of the individual scores, tissue scores and gene loadings and the median PIPs to obtain the most representative value within each cluster (centroid).We finally obtained 126 robust components and we used PIP threshold > .5 to identify genes within each component.

Parsed-PRS association with KCL PET data across different ancestry clusters
Since the KCL discovery cohort was the most heterogeneous in terms of ethnicities included PC2 cutoff = .013;N = 102; Supplementary Fig. 4a) and one including only individuals within the European (EUR) ancestry group (PC1 cutoff = 0; PC2 cutoff = -0.049;N = 88; Supplementary Fig. 4a).Finally, after matching samples based on imaging data availability, for each of these clusters as well as for the whole KCL cohort, we evaluated the association of the C80 stratified PRS (C80-PRS) as well as its complementary score (C80-PRS-complementary) with [18F]-FDOPA uptake in the striatum indexed by Ki in the same identical fashion as the one reported in the main text (see Methods for details).
We found C80-PRS positively associated with greater striatal dopamine synthesis capacity as measured by [18F]-FDOPA specific uptake in NC and in patients with SCZ in the whole striatum and striatal subdivisions ROIs derived from Mawlawi, Martinez 16 and as described in McCutcheon, Beck 17 analysed in both ancestry clusters as well as the whole KCL cohort (Supplementary Fig. 4b).
Interestingly, no significant association was found with complementary C80-PRS.

Exploratory genetic risk association and biological characterization analyses of the other identified components
We explored the association of the 69 identified components with SCZ PRS using the same multiple linear regression methods as reported in the main text (see Methods).We then looked at their biological characterization through enrichment and over-representation analyses to identify enrichment for SCZ risk genes as well as alternative pathways of risk convergence (see Methods for details).We found six additional components out of 69 associated with SCZ PRS at the nominal significance level (two-tailed ⍺=.05, uncorrected; Supplementary Fig. 6a).None showed significant effects of diagnosis in postmortem samples.Of these, only one component (C102) was also enriched for SCZ risk genes (empirical p <.05; Supplementary Fig. 1a).C102 also showed enrichment for MDD, ASD, and PTSD risk genes as well as for SCZ differentially expressed genes (DEGs) previously observed in the CN, DLPFC, and HP (all empirical p <.05; Supplementary Fig. 1a).The tissue loading matrix revealed the C102 component to be most active in the HP (Supplementary Figure 1a).Accordingly, cell specificity analysis showed enrichment for glutamatergic synapses of hippocampal CA pyramidal and dentate gyrus neurons, among others (Supplementary Fig. 6b).
Consistently, KEGG pathway analysis for C102 showed enrichment for glutamatergic synapse as well as for nicotine and morphine addiction-related genes (Supplementary Fig. 6c).Finally, we computed PRSs stratified for genes within each SCZ-PRS-associated component and evaluated their association with striatal dopamine synthesis capacity as measured by [18F]-FDOPA specific uptake in our discovery NC and SCZ cohorts (see Methods).Notably, none of the six components were significantly associated with striatal dopamine synthesis (Supplementary Fig. 6d).Additionally, in order to disentangle the specificity of the effects of C80-PRS on brain activity, we investigated the association of the six stratified PRSs with BOLD signal on the fMRI discovery cohort as previously described (see Methods).We observed no significant association with either reward anticipation or reward consumption (results not shown), further supporting the specificity of the C80-PRS for reward-related brain activity.

fMRI task layout
LIBD cohort.The general layout of a trial in this version of the MID task is divided into three phases: cue (2000-3000 ms), target (up to 1250 ms in the 'difficult condition', while up to 1750 ms in the 'easy' condition), and outcome (450 ms).Immediately prior to scanning, participants were instructed on the task to be performed in the scanner, including being explicitly informed of the meaning of each cue.Cues indicated, via background color, whether the trial would be more likely difficult, more likely easy, i.e., how long the target would likely stay on the screen, and the reward magnitude, i.e., $2 or $0 (control trial).The target was the appearance of paper currency (or blank paper in the control condition) over a money safe.Participants were instructed to respond with a single button press with their right thumb when the target appeared, and if the target was hit, i.e., response was made while the target was on the screen, the target (money bills or blank paper) fell into the money safe (450 ms).On the contrary, a trial was considered an error if the participant: a) did not respond to the target, b) pressed more than once to the target, or c) pressed prior to target appearance.In these cases, the target (money bills or blank paper) disappeared without falling into the money safe and participants did not receive money towards final payment.The outcome phase provided feedback for money earned from that trial.Intertrial intervals (2500-4500 s) were calculated to maintain the same number of trials regardless of reaction time variability.An extensive description of the task is provided by Kholi et al. 2018 18 .
UNIBA cohort.The general layout of a trial in this version of the MID task is divided into three phases: cue (2000 ms), target (up to 500 ms), and outcome (2000 ms).Immediately prior to scanning, participants were instructed on the task to be performed in the scanner, including being explicitly informed of the meaning of each cue.The target presentation duration was calculated based on individual performance during the training phase to achieve at least 67% success rate.Cue stimuli consisted of three different white geometric shapes: a full circle, representing a chance of gain 100 points (reward condition), an empty circle, representing a chance of gain 0 point (control condition) and a full square, representing a chance of losing 100 points (punishment condition).The target was the appearance of a white triangle.Immediately after the response, feedback appeared for 2000 ms documenting whether the participant had won or lost points as well as their cumulative total at that point.Participants were instructed to respond with a single button press with their right thumb when the target appeared to gain or to not lose points.A trial was considered a hit, when the response was made while the target was on the screen.On the contrary, a trial was considered an error if the participant: a) did not respond to the target, b) pressed prior to target appearance.The outcome phase provided feedback for money earned from that trial and the total amount earned so far.

Striatal parcellation of C80-related fMRI activations
To quantitatively assess the striatal subregions' overlap, we evaluated the distribution of the significant striatal clusters observed in the discovery and replication fMRI cohorts using canonical functional striatum ROI according to the parcellation described by Mawlawi, Martinez 16 , which defined three subregions-associative, limbic, and sensorimotor (Supplementary Fig, 7a)-that underlie distinct functions of the striatum based on the cortical afferents each of these subregions projects or receives.We calculated the percentage of clusters where we independently found a significant TFCE-FDR<.05effect of the C80-PRS, overlapping with the three striatal subregions, excluding voxels outside the grey matter.Both clusters predominantly fell within the associative striatum (Discovery=95%; replication=99%) with a minimal percentage in the limbic striatum in the discovery cohort (5%) and sensorimotor striatum in the replication cohort (1%).These findings suggests that the effect is convincingly related to activity in the same striatal sub-region encompassing the caudate nucleus (Supplementary Fig. 7b).
Furthermore, to quantify the extent of voxelwise overlap between the two independently detected clusters from discovery and replication fMRI analyses, we counted the numbers of spatially overlapping voxels corresponding to 6 (162 mm³; Supplementary Fig. 7c).Notably, the overlap is located in the head of the CN, the same region used in the postmortem study.While the spatial correlation of fMRI signals may blur the accuracy of signal localization across different scanners and experiments, the rest of these two overlapping clusters is still contained in the associative striatum as defined by Mawlawi, Martinez 16 .

ROIs analysis on individual striatal fMRI activations
To account for the potential influence of individual variability in the localization of the C80-PRS effects on BOLD signal, we complemented the voxelwise analysis by extracting the individual signal from striatum ROIs using an uncorrected threshold of <.005 19 .We employed the six striatum ROIs (right and left associative, sensorimotor, and limbic subregions per Mawlawi, Martinez 16   Source data are provided as a Source Data file. and as described above).Subsequently, we performed associations of the C80-PRS with the signal extracted from the individual activation maps from the left and the right ROIs.Given that the uncorrected signal derived from the ROIs may be susceptible to type-I errors, post-hoc associations were adjusted for multiple comparisons to account for the number of tests conducted (k=6).Extracting the averaged signal from the associative striatum ROIs, we confirmed the significant positive associations on the signal extracted from the right associative striatum ROI (discovery: t(84)=3.34;p[FDR]=.006;partial R 2 =0.12; replication: t(53)=3.45;p[FDR]=.005;partial R 2 =0.18) along with significant positive associations on the signal extracted from the left associative striatum ROI, though with smaller effect sizes (discovery: t(84)=2.39;p[FDR]=.05;partial R 2 =0.06; replication: t(53)=3.28;p[FDR]=0.009;partial R 2 =0.16;Supplementary Fig.8).No associations were significant in the limbic and sensorimotor striatum (p[FDR]>.05) in either sample.Nevertheless, the potential influence of individual variability in signal localization, coupled with the limited sample sizes could have prevented the identification of under-threshold BOLD effects at the voxel-wise level on the left hemisphere.
Scatter plots show SDA component scores (y-axis) as a function of polygenic risk for schizophrenia (x-axis) and include regression fit line with mean fitted values and related shaded 95% confidence interval.Nominal two-tailed p-values are shown.Source data are provided as a Source Data file.b, Heatmaps show cell-type marker genes overrepresentation using human (light-blue) and mouse (brown) single-cell atlases.Mean-rank Gene Set Test in the limma R package 21 was used to obtain enrichment p-values shown.FDR-adjusted p-values after correcting for multiple comparisons across components (N = 69) and cell types (human atlas = 10; mouse atlas = 24) are shown.See Figure 2 caption for abbreviations.c, KEGG enrichment of all six PRS-associated components.Overrepresentation analysis was performed using the clusterProfiler R 20 package and FDR-adjusted p-values are reported.Diamonds represent each KEGG category (y-axis) enriched for each component (x-axis) and are colored based on the respective adjusted p-value.d, Forest plots show metanalyses of the association between the six stratified PRSs with whole-striatum dopamine synthesis capacity in the PET discovery cohort.Fisher's r-to-z transformed correlation coefficients and related 99.5% confidence intervals are shown.

(
Supplementary Table1), together with the ancestry stratification reported in the main text for the main analyses, we also decided to evaluate different ancestry subdivisions based on the visualization of the first two PCA dimensions, i.e. top axes of variation.Following what we reported in the main text (see Methods), we used a procedure developed by the ENIGMA consortium that consists in performing a PCA on target data merged with the HapMap121 phase 3 reference dataset (https://enigma.ini.usc.edu/wp-content/uploads/2012/07/ENIGMA2_1KGP_cookbook_v3.pdf).For this analysis we included all KCL samples whose genotype information was available (N: 168).We then plotted the first two PCs against each other and defined two different clusters: one excluding individuals belonging to the African (AFR) and Euroasian (EA) ancestry group (PC1 cutoff = .048;