DRD2 co-expression network and a related polygenic index predict imaging, behavioral and clinical phenotypes linked to schizophrenia

Genetic risk for schizophrenia (SCZ) is determined by many genetic loci whose compound biological effects are difficult to determine. We hypothesized that co-expression pathways of SCZ risk genes are associated with system-level brain function and clinical phenotypes of SCZ. We examined genetic variants related to the dopamine D2 receptor gene DRD2 co-expression pathway and associated them with working memory (WM) behavior, the related brain activity and treatment response. Using two independent post-mortem prefrontal messenger RNA (mRNA) data sets (total N=249), we identified a DRD2 co-expression pathway enriched for SCZ risk genes. Next, we identified non-coding single-nucleotide polymorphisms (SNPs) associated with co-expression of this pathway. These SNPs were associated with regulatory genetic loci in the dorsolateral prefrontal cortex (P<0.05). We summarized their compound effect on co-expression into a Polygenic Co-expression Index (PCI), which predicted DRD2 pathway co-expression in both mRNA data sets (all P<0.05). We associated the PCI with brain activity during WM performance in two independent samples of healthy individuals (total N=368) and 29 patients with SCZ who performed the n-back task. Greater predicted DRD2 pathway prefrontal co-expression was associated with greater prefrontal activity and longer WM reaction times (all corrected P<0.05), thus indicating inefficient WM processing. Blind prediction of treatment response to antipsychotics in two independent samples of patients with SCZ suggested better clinical course of patientswith greater PCI (total N=87; P<0.05). The findings on this DRD2 co-expression pathway are a proof of concept that gene co-expression can parse SCZ risk genes into biological pathways associated with intermediate phenotypes as well as with clinically meaningful information.


Expression matrix pre-processing
Braincloud includes 30 176 probes. We included three types of probes: hHC (human constitutive exonic), hHA (human alternative exonic) and hHR (human mRNA). We excluded control probes (hCX and hCT), human ESTs (hHE) and human other (hHO; see Colantuoni et al. 2 ), for further details). Moreover, we excluded unnamed genes (e.g., cORF), except for protein-coding genes located in the loci associated with schizophrenia by the PCG consortium 3 . For genes with multiple probes highly correlated with each other (Bonferroni-corrected α < 0.01, i.e. Pearson's r = 0.36), we selected the ones with the greatest variance 4 .
Meta-data were available in Braincloud for each subject, i.e., demographical variables (age, sex, ethnicity) and sample quality features (RIN, pH, post-mortem interval). Since confounding variables may affect expression estimates 1,5 , we factored out their effects from each gene's expression by means of a stepwise linear model. The Akaike Information Criterion served to select the best set of predictors for each individual gene. The model included the main effect of the six variables above mentioned and the age-sex, age-ethnicity, sex-ethnicity and the age-sex-ethnicity interaction terms. Continuous variables that significantly deviated from the normal distribution in our sample (Shapiro-Wilk test) underwent rank-based inverse normal transformation following Blom procedure 6 .
Finally, each column of the table, i.e., gene expression, was transformed to reduce the impact of outliers and deviation from normality. Thus, we obtained a 199 × 23 636 matrix of Blomtransformed expression residuals reflecting for each subject transcription levels of all genes relative to the entire sample considered. These variables were used for network identification.

Network identification
WGCNA results in topological representation of the relationship between genes which can be used to define clusters of co-expressed genes (gene sets). Network identification is based on similarity between transcription-level profiles of the genes included across individuals. We used functions provided by WGCNA R package to build a Weighted Co-expression Network. First, we verified scale-free topology for any 1 < β < 20 and we chose the lowest value of β that satisfied the criterion (R 2 > 0.8 (7) ). The scale-free topology criterion was satisfied for β = 4 (R 2 = 0.86). The adjacency matrix was calculated by raising the correlation matrix to the selected β as determined by soft thresholding. Minimum gene set size was 40 genes and minimum height for merging gene sets was 0.05.

SNP ASSOCIATION STUDY
We focused our attention on the gene set that included the probe designed to identify the mRNA that is translated into the D2L. This transcript is one of the two products of alternative splicing involving exon 6 of DRD2. D2L is coded by all exons, whereas the short isoform (D2S) does not include exon 6. The probe we identified matched precisely exon 6 of DRD2. We computed enrichment of the gene set for genes located in the loci identified by the Psychiatric Genomic Consortium (PGC) by using a hypergeometric model. We obtained a list of the protein-coding genes encompassing 500 kbp up-and downstream to the SNPs identified by PGC. We considered the genes in this list as hits for the hypergeometric test.
Braincloud includes for each subject the genotypes of 654 333 SNPs spanning the whole genome. We selected for further analyses those SNPs that were also available in an independent dataset of participants recruited by our group (there were 360 119 SNPs available). Moreover, SNPs significantly deviating from Hardy-Weinberg equilibrium (α = 0.003) or with a Minor Allele Frequencies (MAF) < 0.1 were not included because the sample size was too limited to investigate rare variants. We used SnpVariationSuite (SVS; GoldenHelix, Bozeman, Montana) to map genes encompassed in the gene set into RefSeq Genes 63 UCSC build NCBI36 8 . We selected 2 046 SNPs falling into a window of 100kbp up-and down-stream each gene in the co-expression module. We evaluated pair-wise R 2 between markers within the same chromosome. We considered two SNPs independent when R 2 < 0.1 (3) . We then performed a priority LD pruning by iteratively discarding the SNP with the weaker association (lower F-value) with the ME. We used this procedure to enrich our selection for relevant variants (for further applications of a similar procedure see http://prioritypruner.sourceforge.net/documentation.html). The final selection included 658 SNPs with low residual interdependence. We ran a genotypic model in SVS that performs one-way ANOVAs separately for each SNP, selecting those which survived at α = 0.005 (uncorrected). This procedure does not allow, per se, the generalization of the findings to unseen subjects. Instead, we addressed the generalizability of this SNPs with specific cross-validation procedures and replication in an independent post mortem dataset (see below).
Nine independent SNPs were found associated with the ME and were thus selected for the computation of the PCI. We pooled the minor homozygous and the heterozygous samples whenever the sample size of the minor homozygous group was smaller than 10 to avoid biasing statistics and double-checked whether the association was still significant. We checked if ME and each allelic population complied with the normal distribution through Shapiro-Wilk tests. One of the SNPs (rs6504631) was excluded because the association with ME after pooling minor homozygous with heterozygous subjects was no longer significant.

POLYGENIC CO-EXPRESSION INDEX COMPUTATION
The computation of the PCI is based on Signal Detection Theory. We used the Discriminability Index D' to quantify the magnitude of the differences in the ME between the three genotypic populations of each of the eight SNPs selected. For each SNP, we used the major allele homozygote population (MH) as reference group. The criterion for D' computation was set at the average ME of the MH sample. Thus, D' is a measure of how ME discriminated the heterozygous and the minor homozygous samples from the MH group. By this definition, D' of the MH group for each SNP is 0. Positive D' implies greater co-expression levels compared to the MH population and vice versa. The procedure to compute the PCI has been described in detail by Pergola et al. 9 .
After having defined D' for each SNP, we defined the PCI of each participant as the arithmetic mean of D' of the selected SNPs. The greater the PCI, the greater the mRNA coexpression level of that individual.
We performed leave-one-out and k-out cross-validation of the weights of the SNPs in the PCI. These analyses tested whether the eight SNPs included in the PCI are subject to biases caused by influential observations. We defined 7 training sets of different sizes (N train = 40, 60, 80, 100, 120, 140, 160). For each training set we performed 10 000 random permutations and computed the PCI in the out-of-train observations, i.e., the test set. Finally, we assessed the association between the PCI in the test set and the ME for each permutation of each training set. Furthermore, we performed a K-fold cross-validation in which we kept all samples independent as a further test of the robustness of the results. This procedure rules out that the cross-validation results are driven by a subgroup of the subjects which, as an ensemble, bias the results.
We performed an ANCOVA with ethnicity and PCI as predictors and the ME as dependent variable to assess whether the association between PCI and ME differed among ethnicity groups.
Moreover, we computed the Pearson's r between the PCI and the ME in the two groups separately.
Finally, we asked whether our priority pruning biased SNP selection. In principle, this procedure yields a risk of overfitting because the seed for LD pruning is not random, but based on the target variable. The rationale is to keep the most important genetic loci and make SNPs statistically independent, which is a requirement of the PCI computation. To validate the SNP selection, we performed three analyses: i) we computed a leave-one out cross validation which, at each iteration, selected the first eight SNPs (i.e., as many as in the PCI) and associated the resulting PCI with the module eigengene in the left-out subject; ii) we computed 100 random-seed LD pruning and associated the resulting SNPs with the ME; then, we linked each SNP with the closest module gene and obtained a ranking of the genes based on their most significant SNP. We correlated these 100 rankings with the ranking obtained by our priority pruning and compared this population of correlations with that obtained through 100 random gene lists (null distribution). A large difference between the correlations obtained and the null distribution would suggest that our priority pruning does not bias the relationship of the genes with the module eigengene as assessed by the co-eQTL analysis. We assessed the difference using Cohen's d. iii) we evaluated the inclusion of increasing numbers of SNPs in the PCI. The rationale of this analysis is to select the SNPs that together explain the largest possible proportion of variance. Through this procedure, the PCI should be enriched for true positive markers 10 . In particular, we computed a series of PCIs following the ranking of the priority pruning. So, for instance, the first ranked SNP was included in the first PCI; the first two SNPs were included in the second PCI and so on until the 100 th SNP.
Critically, as in point i), we calculated the PCIs through a leave-one-out cross-validation. Therefore, for each individual we computed the PCI based on the SNP weights derived from the other 198 subjects. This procedure made the test set (i.e., the left out subject) independent of the training set (i.e., the 198 remaining subjects). Then, we correlated each of the 100 cross-validated PCI with the ME (first principal component of gene expression). Notably, to compute the ME we did not use the left out subject, but obtained the ME using the loadings of the first principal components in the training sample. Then, we evaluated and plotted the correlations. Following this analysis, we computed the percent change of the correlations in the PCI series. In particular, we computed a series of percent correlation changes between the second and the first PCI, the third and the second, and so on. We defined the percent correlation change as (r j+1 -r j )/r j , where r represents the Pearson's correlation between the PCI and the ME. We plotted the percent correlation change. All of these analyses were performed using custom-written scripts in the software R 3.0.2 64-bit.

co-eQTL replication
We assessed the overlap between the BrainEAC and the Braincloud probes by comparing the nucleotide sequences (see Supplementary Table 1 for details). Of the 80 probes which were present both in Braincloud and BrainEAC, 37 probes were fully overlapping based on their nucleotide sequence, 28 had partial overlap with the Braincloud probes, 15 showed no overlap. For the latter genes we used the gene-level expression estimates in BrainEAC. BrainEAC differs from Braincloud as far as RIN is concerned. Data were available for 127 healthy subjects from post mortem Frontal Cortex brain tissue; we removed 4 outliers for pH (iterative Grubbs test, all p < .05) and all observations with RIN ≤ 5 in the Frontal cortex. This decision was taken as a trade-off between comparability across datasets (only samples with RIN ≥ 7 were included in the analysis on Braincloud data) and sample size. The final sample included 50 healthy Caucasians. Data were then preprocessed as for the discovery co-eQTL study and we extracted the module eigengene. Finally, we computed the PCI of each individual and associated it with the ME using Pearson's correlation to replicate the co-eQTL results. Since our a priori hypothesis was that the direction of the correlation would be the same as in the discovery sample we used one-tailed probabilities and set our  = .05.
Supplementary Table 1 about here

Association of the module eigengene and of D2L expression with genomic principal components
We used the R package SNPRelate to compute genomic principal components (GPCs).
Braincloud includes for each subject the genotypes of 654 333 SNPs spanning the whole genome.
We pruned markers by linkage disequilibrium using r 2 ≤0.9 to alleviate LD bias 11 . We extracted 10 PCs and tested the Spearman correlation with the ME and with D2L expression.

Association of the Polygenic Co-expression Index with age
Gene expression profiles are susceptible to changes along development and aging. In this study we kept all post-natal samples in the analysis to obtain greater statistical power. We tested whether our preprocessing pipeline was effective at removing age effects by correlating the expression of the eight genes reported in Table 2 in the main text with age. Then, to ensure that the PCI predicted co-expression in the age range included in our samples, we assessed the PCI/coexpression correlation in samples with restricted age range. As shown in Table 1 in the main text, the samples we studied with fMRI, behavioral and clinical measures included participants between 15 and 55 years old, thus we restricted the Braincloud and BrainEAC samples to this interval.

Participants
All subjects were evaluated with the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders 4th Edition, to exclude any psychiatric disorder. All participants in the imaging study were genome-wide genotyped using microarray technology. Along with the absence of psychiatric conditions, other exclusion criteria were represented by familial risk for psychiatric disorders (first-degree relatives of patients were excluded), history of drug or alcohol abuse, active drug use in the previous year, head trauma with loss of consciousness, presence of metal implants on participants' body, and any relevant medical condition. Moreover, data affected by scanning artifacts or showing excessive movement during the scanning sessions were not further analyzed, leading to exclusion of the participant. Participants who performed so poorly in the cognitive task that their accuracies could not be discriminated from chance level were excluded.
The final sample included 124 healthy, unrelated, Caucasian participants. Participants took part in a separate session aimed to collect socio-demographic and neuropsychological information by means of semi-structured interviews. We used the Edinburgh Handedness Inventory 12 to assess hand preference.

Genotyping
Participants in the imaging study underwent blood withdrawal for subsequent DNA extraction from peripheral blood mononuclear cells. To this aim, approximately 20ml of fresh blood was obtained through a conventional venous blood collection with 10ml EDTA Vacutainer Venous Blood Collection Glass Tubes (Vacutainer ®). Approximately 200 ng DNA was used for genotyping analysis. DNA was concentrated at 50ng/µl (diluted in 10 mM Tris/1mM EDTA) with a Nanodrop Spectrophotometer (ND-1000). We used Illumina HumanHap550K/610Quad Bead Chips (San Diego, California) to genotype our sample. Briefly, each sample was whole-genome amplified, fragmented, precipitated and resuspended in appropriate concentrations of hybridization buffer.
After hybridization, the Bead Chip oligonucleotides were extended by a single labeled base, which was detected by fluorescence imaging with an Illumina Bead Array Reader. Normalized bead intensity data obtained for each sample were loaded into the Illumina GenomeStudio (Illumina, v.2010.1) with cluster position files provided by Illumina, and fluorescence intensities were converted into SNP genotypes. After genotypes were called and the pedigree file was assembled, we removed SNPs showing minor allele frequency <1%, genotype missing rate >5%, or deviation from Hardy-Weinberg equilibrium (p<0.0001). Individuals were also removed if their overall genotyping rate was below 97%. Sample duplications and cryptic relatedness were ruled out through identity-by-state (IBS) analysis of genotype data.

Working memory task
The N-Back task consisted of two runs each with two conditions. Participants did not exit the scanner during the breaks. The first block featured 0-back and 1-back, the second featured 0back and 2-back. The stimuli were arranged in a block design, consisting of eight 30-s blocks: four blocks of the control condition alternating with four blocks of each WM. Each block began with task instructions (2 s) and included 14 task trial (duration: 0.5 s, inter-trial interval: 1.5 s). Stimuli were presented via a back-projection system and behavioral responses were recorded through an optic fiber response box which allowed assessment of accuracy (percentage of correct responses) and reaction time for each trial. Numbers appeared at a fixed location, which was highlighted by a circle within the diamond-shaped box. Participants responded with their right hand with a fourbutton MRI-compatible pad within the scanner.

fMRI data acquisition and analysis
Blood oxygen level-dependent (BOLD) signal was recorded by a GE Signa 3T scanner (General Electric, Milwaukee, WI), equipped with a standard quadrature head coil. A gradient-echo planar imaging sequence, (repetition time, 2000 ms; echo time, 28 ms; 20 interleaved axial slices; thickness, 4 mm; gap, 1 mm; voxel size, 3.75 × 3.75 × 5 mm; flip angle, 90°; field of view, 24 cm; matrix, 64 x 64) was used to acquire 120 volumes while subjects performed the WM task. The first four scans were discarded to allow for equilibration effect. We used the Matlab (Mathworks, Natick, Massachusetts, USA) toolbox Statistical Parametric Mapping 8 (SPM8; http://www.fil.ion.ucl.ac.uk/spm) to analyze the fMRI data using standard procedures.
We realigned the images to correct for motion artifacts. Images of each subject were pre-processed using the Realign and Unwarp function (SPM8), in order to compensate for non-linear signal distortions induced by head motion. Movement parameters were extracted to exclude subjects with excessive head motion (2 mm of translation or 2° of rotation). Realigned images were resliced to a 3.75 mm isotropic voxel size and spatially normalized into a standard space (Montreal Neurological Institute) by using a 12 parameter affine model.
Then, we smoothed the scans (12 mm full-width at half-maximum isotropic). Effects of condition at individual subject level were evaluated producing a t statistical map for the contrasts of interest (1-back versus 0-back and 2-back versus 0-back). Individual contrast images were used for group-level analyses. Finally, we used the ArtRepair Software 13 .
(https://www.nitrc.org/projects/art_repair/) Outlier Detection algorithm to identify participants with scans displaying activity which deviated from the remaining sample. Three subjects (i.e., 2.4% of the data) were excluded in this step.
Accuracy during the scan, i.e., percentage of correct responses, was evaluated to exclude those participants whose performance was not statistically different from chance level and thus they were likely to not have been optimally engaged on the task during the experiment. We computed asymptotic binomial [B(n=54, p=0.25)] confidence (CI) interval at 90% confidence level (CI 90 = 0.15 -0.35). Accuracy cut-off was set at 35% for the 2-back task, namely the upper limit of the CI.
Six participants performing poorer than 35% were thus excluded. No participant performed below 35% at 1-back. Further analyses were conducted on this sample encompassing 124 healthy participants.
Effects of condition at individual subject level were evaluated using a box car model convolved with the hemodynamic response function at each voxel, and a high-pass filter of 128 sec.
Linear contrasts were computed producing a t statistical map for the 1-and 2-Back conditions, assuming the 0-Back condition as baseline. In order to further control for movement and acquisition artifacts, we calculated a bad volume regressor (BVR), using the Bad Volume function included in ArtRepair. For each subject, volumes with artifacts, detected based on scan-to-scan motion > 1mm/TR (>1 degree) and outliers relative to the global mean signal (> 5 standard deviations from global mean) were identified and included in BVR. In addition, a 24-parameter autoregressive model (Friston-24) was calculated, including current and past position parameters, along with the square of each parameter. Both BVR and Friston-24 were included as regressors in the individual first-level statistic.

Association of the Polygenic Co-expression Index with brain activity and behavior during working memory performance
Individual contrast images were used for regression analysis at the group level. We used a repeated measures ANCOVA design with LOAD as within-subjects factor and the genetic score as a regressor of interest. We also modeled the factor gender and included age and score on Edinburgh Inventory as covariates of no interest. We tested the main effect of the genetic score and the interaction of the genetic score with LOAD. We masked results selecting the voxels in which the activity was higher during working memory than the baseline. To this end, we computed onesample t-tests on the whole sample of participants to discover brain regions in which activity was significantly greater during both 1-back and 2-back conditions compared to 0-back baseline (wholebrain α=0.05 FWE-corrected, null conjunction analysis). MNI coordinates of statistical maxima of activation were converted to conform to the standard space of Talairach using the Nonlinear Yale MNI to Talairach Conversion Algorithm (http://noodle.med.yale.edu/~papad/mni2tal). Anatomical localizations were determined using the Talairach Daemon software (http://www.talairach.org/daemon.html).
We asked whether imaging results were robustly associated with the SNPs we found or whether they were driven by a few epistatic SNP interactions. Therefore we cross-validated the SNPs of the PCI for their effect on imaging phenotypes. We obtained 8 new PCIs each including 7 SNPs weighted by association with the ME after exclusion of 1 SNP at each cycle. We then tested the effect of these PCIs on imaging phenotypes. Additionally, we modeled the effects of DRD2 rs1076560 genotype to rule out that the effect of the PCI was mediated by other genetic factors.
Finally, we investigated behavioral performance (accuracy and reaction time) during the WM task in the scanning session. We removed from this analysis two participants with RT<150 ms either in the 1-back or 2-back conditions because such timing could only be achieved by means of preemptive button presses. The final sample for the behavioral analysis included 122 participants. In the second healthy fMRI sample, participants scoring below 70% in the 2-back task were excluded by protocol. Thus, the behavioral variability in this sample was not comparable with that of the first sample, and we did not replicate the behavioral analysis.

Imaging replication in healthy subjects
Participants. Exclusion criteria included history of psychiatric and neurologic disorders and familial risk for psychiatric disorders (none of the subjects had a first-degree relative with a schizophrenia spectrum disorder or received psychotropic pharmacological treatment), history of head trauma, and history of alcohol or drug abuse. All participants had WAIS IQ>80. All fMRI data acquisition and analysis. BOLD fMRI data were acquired on a 3T GE Signa Scanner (Milwaukee, WI) using a gradient-echo echo planar imaging sequence (TR=2000 ms, TE=30 ms, flip angle=90°, field of view=24 cm, matrix=64x64, 24 6mm thick slices). Individual linear contrast images of the 2-back > 0-back conditions entered in a second-level group analysis.
Since 1-back assessments were not available for all subjects we focused on 2-back. The fMRI images were processed following standard procedures in SPM8 (http://www.fil.ion.ucl.ac.uk/spm).
The fMRI images were first co-registered to high-resolution anatomical images and analyzed using Statistical Parametric Mapping 8 (SPM8; http://www.fil.ion.ucl.ac.uk/spm). The data were corrected for head motion artifacts, excluded if motion exceeded 2 mm in translation or 1.5° in rotation (with motion parameters used as covariates of no interest in first level analysis), spatially normalized to a 3 × 3 × 3 mm 3 voxel size into a standard stereotactic space (MNI template) using affine and nonlinear transformation, and then smoothed with an 8mm full-width at half-maximum Gaussian filter. The processed images were analyzed in a two-level procedure. At the first level, separate general linear models were specified for each subject by modeling the alternating task conditions as a box car reference vector that was convolved with the SPM8 standard hemodynamic response function at each voxel. Residual movement was added as a nuisance variable. Data quality was further assessed based on time series signal-to-noise ratio, signal variance, and artifacts like ghosting at each stage of initial analysis. Second-level results were masked for activity at 2-back.

Association of the Polygenic Co-expression Index (PCI) with BOLD response during
working memory performance. The effect of the PCI on BOLD response was tested using robust regression. The model included activation estimates in the clusters obtained through the analysis in the first healthy sample with the same nuisance variables. The PCI was the independent variable.
We tested the positive regression slope following the results of the main experiment. To this aim, we used a ROI approach with the clusters positively associated with the PCI in the first healthy sample and corrected for multiple comparisons by means of FDR 14 (four tests corresponding to the four clusters detected). We extracted parameter estimates from these ROIs and considered results significant at a statistical threshold of FDR-corrected α < .05 (one-tailed).

Imaging study in patients with schizophrenia
All patients had pre-morbid T.I.B. IQ ≥ 90 (mean ± standard deviation (SD): 106 ±7.0; range 92.0-119.2) and were on stable antipsychotic treatment (mean of Gardner equivalents dose ± SD: 680 ± 272; range: 150-1500). We used PANSS to evaluate symptoms severity (mean ± SD: 61.3 ± 20.5; range: 15-104). We report results obtained with the same model used for healthy controls and also with a model in which symptoms severity and equivalents of chlorpromazine were added as nuisance variables in the general linear model.

Participants and clinical protocols
Supplementary Table 2

Statistical procedures
We assessed treatment improvement variables statistical distribution with Shapiro-Wilk test, which turned out significant, i.e., the variables were not normally distributed (first cohort: W= 0.96, p = 0.07; second cohort: W=0.9, p < 0.05). In the association study, we used Spearman's Rho correlation index to associate PANSS total scores and PCI. The PCI of the two cohorts was computed based on genotypes assessed as above reported. In the second clinical cohort, based on the results from the first cohort, we tested to what extent large PCI values predicted greater improvement using Spearman's Rho correlation index (one-tailed probabilities).
In the predictive model, we categorized patients in responders and non-responders. We used 20% symptom improvement as a cut-off 18,19 . We tested the PCI, the treatment dose, and offmedication symptom severity to predict membership of individual patients to response category.
We used ROC curves to estimate sensitivity and specificity and compute the area under the curve (AUC) as an index of the difference between the predictors and chance level (non-parametric test, SPSS). Since sub-sampling the cohorts in the prediction study substantially reduced statistical power, we computed ROC curves on a pooled sample including both cohorts. We marginalized the effect of cohort by identifying variables showing significant changes between the first and the second cohort by means of t-tests and then computed the residuals of these variables using one-way ANOVAs.

WGCNA
We detected 67 sets of co-expressed genes. 2097 genes out of the 23636 were not clustered in any gene set. We computed the module eigengene (ME), i.e., the first principal component, of all genes' expression in each gene set. The median gene set size was 170 (range: 46 -2 452) and the median variance explained by the MEs was 36.2% (range: 24.0% -62.4%).

COMPUTATION
We pooled minor allele carriers for rs6504631 and rs12940715 (minor homozygous respective sample sizes were n=4 and n=3) and we excluded rs6504631 (p-value > 0.05 after pooling subjects). The first principal component and each allelic subpopulation complied with the normal distribution (Shapiro-Wilk test for normality, all p-values > 0.05). By definition of the PCI, participants with greater PCI scores also had greater co-expression levels. Notably, the linear fit between the PCI and the leave-one-out cross-validated PCI was nearly perfect (t 198 = 116, R 2 = 0.99, p = 3.2×10 -183 ; Supplementary Figure 2), suggesting that the association between expression of the gene set and the PCI was not driven by a few influential observations. The leave-one-out crossvalidated PCI was associated with expression of the gene set (ME, t 198 = -9.4, R 2 = 0.31, p = 1.6×10 17 ; Supplementary Figure 2) and D2L transcriptional levels in the left-out subjects (t 198 = -5.2, R 2 = 0.12, p-value = 4.5×10 -7 ). In the leave-one-out cross validation the weights of the SNPs were computed on N-1 subjects and were applied to the N th subject. Therefore, this measure reflects the power of the PCI to approximate D2L gene set co-expression levels in subjects independent of the training dataset. Leave-k-out cross-validation showed that cross-validated PCI was still associated to the ME when the training set was about half (N = 100) of the whole sample (R 2 mean ± SD of 10 000 random permutations = 0.28 ± 0.05, see Supplementary Figure 2). K-fold crossvalidation supported these results, with PCI explaining on average 32% of ME variance (Supplementary Figure 2).
The ME was not associated with ethnicity (p = 0.9), and the PCI by ethnicity interaction was not significant (p = 0.9). When tested independently, the association between the ME and the PCI was comparable between Caucasian (r = 0.69) and African-American individuals (r = 0.53).
We tested the robustness of the SNP selection with three analyses: i. we cross-validated the SNP selection (keeping the original priority pruning) and tested the association between the ME and the PCI computed using the first eight SNPs for each cycle of the cross-validation. We found a strong correlation between the two variables (R 2 = .38, Supplementary Figure 1); ii. we ranked the genes by their most associated gene and compared the ranking with that resulting from 100 randomseed LD pruning. The rankings correlated positively (median r = .35) and were extremely different from the null distribution (Cohen's d = 2.5); iii., we investigated the correlations between the ME and the PCI as a function of the number of SNPs included in the polygenic index. Supplementary   Figure 3a represents the correlation of a series of PCIs with the ME and with DRD2 expression.
Each point indicates the Pearson's correlation coefficient for a cross validated PCI including a given number of SNPs (represented on the x-axis and varying between 1 and 100). With few SNPs, the correlation increases steeply with the inclusion of more markers, but plateaus around 20 SNPs, i.e., there is no effective information increase. This is more evident in Supplementary Figure 3b, which plots the percent correlation variation as a function of the number of SNPs included in the PCI.
Here, the difference between 8 and 9 SNPs is the last variation > 5%. Therefore, including in the PCI a number of SNPs greater than 9 causes no relevant increase in the association between the PCI and the ME. This is exactly the number of SNPs selected for the PCI (one of them was excluded because expression was not normally distributed within all genotypic populations).
These validation steps, together with the findings on the functional role of the SNPs reported in the SI Appendix, support the idea that the PCI we used, which included 8 SNPs, is not biased by overfitting. In further support of these results, Supplementary Figure 4

Association of the Polygenic Co-expression Index with genomic principal components
Supplementary Table 3 shows the variance explained by each PCs extracted, Spearman's Rho and correlation p-values with the ME and D2L expression. We did not find a significant association between the PCs and either variable.
Supplementary Table 3

Association of the Polygenic Co-expression Index with age
After preprocessing, none of the eight module genes proximal to the SNPs in the PCI was significantly associated with age (all p > .35). We investigated whether our PCI predicted the ME even when we restricted age range to the [15,55] years interval. We chose this interval because it is the age interval represented in the fMRI, behavioral, and clinical studies. We found that the R 2 of the association was basically unaffected (Braincloud: all samples, R 2 = .38, restricted age range, R 2 = .35; BrainEAC: all samples with RIN > 6, R 2 = .14, restricted age range R 2 = .17).

Supplementary fMRI analyses in healthy individuals
In the fMRI analysis reported in the main text we adopted a within-subject design to associate the PCI with WM activity. To rule out potential bias introduced by specific statistical models, we additionally computed the average between 1-back and 2-back activations, correlated this activity with the PCI and reported the activations at the same statistical threshold of the withinsubject analysis. Results are largely overlapping (Supplementary Figure 5), for example, here the "A" cluster reported in Table 3

Complete statistics of the behavioral analysis reported in the main text
We performed repeated measures ANCOVAs (within-subjects factor LOAD [1-back, 2back]) on WM accuracy, reaction times, and on an efficiency index (accuracy over reaction times). The latter analysis is reported for exploratory purposes. The analysis on accuracy yielded a significant main effect of LOAD (F 1,120 =43, p<.001). There was no significant effect of PCI (F 1,120 =0.38, p=.54) and no significant LOAD×PCI interaction (F 1,120 =.11, p=.74). The analysis on reaction times yielded a significant LOAD×PCI interaction, surviving Bonferroni correction (two tests [accuracy and reaction times];  = .025; F 1,120 =6.9, p=.01). Supplementary Figure 8 plots the relationship between reaction time at 2-back and PCI. The main effects of LOAD (F 1,120 =2.5, p=.12) and PCI (F 1,120 =1.9, p=.18) were not significant. Post-hoc regressions computed to resolve the significant interaction were non-significant for 1-back (t 120 =.12, adjusted R 2 =-.008, p=.91) but yielded a significant fit for 2-back (t 120 =2.3, adjusted R 2 =.033, p=.024). Greater D2L PCI was related with longer reaction times at 2-back. The analysis on efficiency yielded a significant LOAD×PCI interaction, (F 1,120 =4.4, p=.037). The main effects of LOAD (F 1,120 =.34, p=.56) and PCI (F 1,120 =2.8, p=.13) were not significant. Post-hoc regressions computed to resolve the significant interaction were non-significant for 1-back (t 120 =.40, adjusted R 2 =-.007, p=.69) but yielded a significant fit for 2-back (t 120 =2.3, adjusted R 2 =.035, p=.023). Greater D2L PCI was related with lower efficiency at 2-back.

PHARMACOGENETIC STUDY
We found that the cohorts differed significantly for treatment dose (in chlorpromazine equivalents; t 84 = 5.0, p < .001) and off-medication symptoms (t 84 = 8.6, p < .001). Therefore, we marginalized these variables by the factor cohort. Supplementary Figures 8 and 9 show the

SI DISCUSSION
Beside rs2486064, another SNP included in our PCI is related to the GSK3β pathway: rs851436 is located downstream of OSR1, which codes for a transcription factor that suppresses the expression of cytoplasmic β-catenin 20 . β-catenin has been involved in mechanisms of signaling transduction within the D2R cAMP-independent pathway 21 . Also interesting in a neurogenetic perspective is the Sdk2 neuronal adhesion molecule in which rs12940715 is located. In particular,

Sdk2 binds proteins involved in recruitment of transmembrane molecules, such as Post-Synaptic
Density proteins including PSD-95, which binds neurotransmitter receptors. Notably, the postsynaptic density has been considered a critical cellular district for the pathogenesis of schizophrenia 22 .
In summary, the co-eQTLs identified in the present study are located in genetic regions associated with cAMP-independent D2R signaling, synaptic function, response to stress, and learning. We successfully replicated the findings in independent datasets, which suggests that these co-eQTLs play a critical role in determining inter-individual variability associated with DRD2dependent phenotypes. Furthermore, imaging results were not affected when we cross-validated SNPs for the computation of the PCI, suggesting that the eight SNPs detected independently modulate molecular and imaging phenotypes.

Considerations on the pharmacogenetics studies reported
Medication compliance and drug abuse may play a major role in clinical outcome and/or brain activity, and may share genetic risk factors with schizophrenia. To partially control for medication compliance, in the clinical protocol of the discovery study we included multiple intermediate clinical assessments at days 7, 14, and 28 of treatment. On the other hand, the clinical protocol of the NIH sample was performed while patients were admitted to the hospital. Therefore, in both samples medication compliance is less likely to exert a major confounding role. We also collected information on drug abuse in patients and controls by means of semi-structured interviews, although we have no biochemical evidence of abstinence from drugs available.
Treatment response was generally lower in the replication dataset for several reasons. First, these patients entered the protocol following history of inadequate treatment response, whereas the first clinical included patients who were drug-free or drug-naïve at baseline. Second, treatment duration in the second clinical sample varied between 1 and 6 weeks, while it was always 8 weeks in the first. Third, antipsychotics varied across patients in the second clinical sample, whereas all patients in the first clinical sample were treated with olanzapine in monotherapy. The discrepancy between datasets adds support to the conclusion that the PCI is associated with treatment response, regardless of details of the protocol.

Supplementary Figure 1-Results of the co-eQTL study
In all panels, the solid line represents the regression fit and the dotted lines represent 95% confidence intervals for the regression.  panel d: Leave-k-out cross-validation. Each box represents 10 000 permutations. The PCI was computed based on data from different training samples (size indicated on the X-axis). The association with the ME was permuted to obtain median and dispersion estimates for the boxplot.
R 2 values increase with increasing training information, but remain stable with smaller training samples. Abbreviations: PCI, Polygenic Co-expression Index.

Supplementary Figure 3-Correlation analysis of the cross-validated PCI
a. The scatterplot illustrates the correlation between a series of PCIs computed with variable numbers of SNPs and two dependent variables (the ME and DRD2 expression). b. The plot displays the percent variation of the correlation between the ME and a series of PCIs computed with variable numbers of SNPs. For the sake of clarity the percent change is limited to the first 25 PCIs (the information increase is negligible for more inclusive PCIs). To the right of the ninth PCI there is no increase > 5%.

Supplementary Figure 4-Variation of explained variance based on data quality in BrainEAC
The X-axis represents the minimum RNA integrity (RIN) of samples included. The Y-axis represents the R 2 of the correlation between the Module Eigengene (ME) and the Polygenic Coexpression Index (PCI). The plot shows that the association between the ME and the PCI steadily  The first column reports the probe name in Braincloud. The second column reports the corresponding gene name. The third column contains the probe numbers of BrainEAC. Here, the following legend applies: green: the nucleotide sequence of the Braincloud probe completely overlaps with BrainEAC's exon-specific probe; orange: partial overlap; black: no overlap between the probes (total gene expression in BrainEAC was used in this case); red text: genes not found in BrainEAC. Probes are listed following their ranking in scaled connectivity within the gene set (KWithinScaled). The fourth column reports whole-network connectivity for each gene (kTotal). The kTotal of the ith gene is defined as the sum of the adjacency matrix values a ij where the j-th genes encompass all network genes. The fifth column reports intra-modular connectivity (kWithin). The kWithin of the i-th gene is defined as the sum of the adjacency matrix values a ij where the j-th genes encompass D2L gene-set genes. The sixth and seventh columns report extra-modular connectivity (kOut), that is merely kTotal minus kWithin and the difference between kWithin and kOut (kDiff). The eighth and ninth columns report scaled values of kWithin and kTotal (i.e., kWithin i /kWithin MAX , and kTotal i /kTotal MAX ). We excluded one at a time each SNP from the computation of the PCI (see first column) and computed statistics of the fit of the resulting Polygenic Co-expression Index (PCI) with fMRI activation. Data were thresholded at topological FDR-corrected q < .05 to allow unbiased comparison of cluster extents.