Introduction

Substance use traits (SUTs), including smoking initiation (Smk), alcohol use disorder (AUD), cannabis use disorder (CUD), and opioid use disorder (OUD), are highly prevalent and leading causes of morbidity and mortality [1,2,3]. An estimated 40–60% of the risk of SUTs is attributable to genetic factors [4,5,6]. Recent large-scale genome-wide association studies (GWAS) of SUTs have implicated underlying biological systems [7,8,9,10]. Despite the growing success of GWAS in identifying associated risk polymorphisms, many of the identified variants are intronic or intergenic [11] and exert only small phenotypic effects, suggesting that they are mediated by the regulation of transcription.

Recent efforts to understand how genetic variation influences gene transcription and contributes to disease have led to the development of analytic frameworks such as functional summary-based imputation (FUSION) [12], S-PrediXcan [13], summary data-based Mendelian randomization (SMR) [14], and Coloc [15]. These frameworks utilize a transcriptome-wide association study (TWAS) approach, integrating external gene expression reference data and GWAS results to prioritize genes whose cis-regulated expression is associated with disease phenotypes.

To facilitate the identification of genes with cis-regulated expression profiles, TWASs have been conducted for cigarette smoking [11], cocaine dependence [11], problematic alcohol use (PAU) [8], and OUD [10]. Although these TWASs shed light on potential mechanisms through which genetic loci associated with SUTs exert their effects, the evidence they provide of expression quantitative trait loci (eQTL) effects are at the level of messenger RNA (mRNA), rather than protein abundance. Genetic variation can influence protein abundance by altering the rate and stability of gene expression [16], though it is an empirical question whether the identified genetic loci exert their effects on SUTs by modulating protein abundance in the brain. The importance of this question lies in the fact that proteins, as the final products of gene expression, are the main functional components of cells and biological processes [17], and comprise most drug targets and biomarkers [17, 18].

The current study examines whether loci identified through GWAS contribute to the pathogenesis of SUTs by modulating protein abundance. We applied an integrative proteome-wide association study (PWAS) approach that combines genetic data from 4 large GWASs of SUTs [7,8,9,10] with 2 independent human brain proteomic datasets (Banner [19] and ROSMAP [20]) derived from brain dorsolateral prefrontal cortex (dPFC). To compare the effects of risk variants at both the proteomic and transcriptomic levels, we also performed TWAS using the CommonMind Consortium (CMC) dPFC [21] and Genotype-Tissue Expression (GTEx) v7 frontal cortex [22] datasets. See Fig. 1 for a study overview.

Fig. 1: Overview of the study.
figure 1

GWAS summary statistics included in the study were based on 4 substance use traits (SUT): smoking initiation (Smk), alcohol use disorder (AUD), cannabis use disorder (CUD) and opioid use disorder (OUD). For PWAS, human brain proteomes from Banner (discovery) and ROSMAP (validation) datasets were integrated with each set of GWAS summary statistics. TWAS based on brain eQTL datasets from discovery (CommonMind Consortium - CMC) and validation (Genotype-Tissue Expression - GTEx) datasets was conducted for each SUT. TWAS splicing expression analysis (CMC - sQTL) was also performed for all 4 traits. N denotes the actual sample size of studies included in each analysis. Colocalization analysis was based on nominally significant proteins and transcripts for Smk and proteins for AUD. All significant proteins (after Bonferroni correction) were used as input for drug-gene interaction analysis for all 4 traits. Only proteome-wide significant proteins for Smk and AUD were included for protein-protein interaction analysis.

Methods

Genome-wide association studies summary statistics

We selected the largest GWASs of SUTs available as of July 2021; all were of European ancestry (EUR) to match the proteomic datasets. Summary statistics were derived from n = 1,232,091 for Smk [7]; n = 313,959 for AUD [8]; n = 384,032 for CUD [9]; and n = 302,585 for OUD [10]. Sample demographics and methods for phenotyping, data processing, and statistical analyses from the original articles [7,8,9,10] are summarized in Supplementary Table 1.

Human brain pQTL data

We obtained human brain proteomic data from the study by Wingo et al. [23, 24], in which human protein abundance was quantified in the dorsolateral pre-frontal cortex (dPFC) of post-mortem brain tissues from 152 (Banner dataset) [19] and 376 (ROSMAP dataset) [20] EUR participants. Characterizing genetic control of the proteome in these human brain datasets identified 1,139 and 1,475 protein quantitative trait loci (pQTL) (i.e., protein weights) in the Banner and ROSMAP datasets, respectively [23, 24], which were downloaded from https://doi.org/10.7303/syn23627957. Sample demographics, proteomic sequencing, quality control, and normalization from Wingo et al. [23, 24] are summarized in Supplementary Table 1.

Proteome-wide association analysis

To identify proteins whose genetically regulated expression is associated with SUTs, we performed PWAS analyses by integrating GWAS summary statistics of SUTs and pQTLs from discovery (Banner) and validation (ROSMAP) datasets using the FUSION pipeline (http://gusevlab.org/projects/fusion/) [12]. FUSION employs 5-fold predictive models (top1, GBLUP, LASSO, Elastic Net, BSLMM) to compute the combined effect of single nucleotide polymorphisms (SNPs) on protein expression weights. The model with the largest cross-validation R2 was selected for downstream analyses. PWAS association statistics were Bonferroni corrected based on the number of proteins in the analysis (see footnote in Table 1 and Supplementary Table 3).

Table 1 Results of the PWAS of smoking initiation.

Human eQTL data

Human brain transcriptome data, used as expression reference panels, were obtained from the CMC [21] and GTEx frontal cortex v7 [12, 22]. The CMC dataset, which consists of transcriptomic profiles for gene-level (eQTL, n = 452) and intron-level (splicing–sQTL, n = 452) expression [21] in dPFC and GTEx frontal cortex weights (eQTL, n = 118) were downloaded from the FUSION website (http://gusevlab.org/projects/fusion/) [12].

To examine the association between the cis component of gene expression and SUTs, we performed a transcriptome-wide association analysis (TWAS) using the FUSION package [12]. TWAS was performed using gene and splicing expression profiles measured in adult dPFC and gene expression profiles from frontal cortex. For both PWAS and TWAS, we applied the default parameters in FUSION. To explore whether there was significant enrichment in the genes identified in the PWAS and TWAS, we created proteome-wide significant (PWS) and transcriptome-wide significant (TWS) gene sets. For each gene set, we created a 2-by-2 table to compare the number of listed genes belonging to the gene set to the number of non-members and tested the significance of over-representation using a binomial approximation (p < 0.05) [25].

Colocalization of PWAS and TWAS associations

To explore plausible causal relationships between GWAS variants and proteome- or transcriptome-wide associations, we performed colocalization analysis using the coloc R package (version 3.2-1) [15] in FUSION [12]. We used the FUSION parameter (-coloc_P 0.05) for inclusion of nominally significant proteins/genes (at p < 0.05) and performed colocalization based on the GWAS and pQTL (ROSMAP and Banner) [23, 24], eQTL (CMC and GTEx) [21, 22] and sQTL (CMC) [21] data. A posterior colocalization probability (PP) of 80% was used to denote a shared causal signal.

Conditional analysis and TWAS fine mapping

Conditional analyses were performed for PWS and TWS genes using FUSION [12] to determine whether multiple gene-trait associations within given risk loci are independent after adjusting for the predicted expression of other genes in the region. Genes that overlapped within the 1.5-Mb boundary were merged into a single locus. This allowed us to estimate the extent to which functional associations identified in the PWAS/TWAS explain the GWAS signals within the same locus. The “variance explained” was calculated using the formula: R2 = 1-χ2 conditioned GWAS association / χ2 unconditioned GWAS association [12, 26]. To identify genes within regions of association that are likely causal, we performed TWAS fine-mapping using FOCUS [27]. A posterior inclusion probability (PIP) > 0.5 indicated which genes within a region are most likely causal.

Drug-gene interaction

We examined interactions of proteins identified in the PWAS with prescription drugs using the Drug Gene Interaction Database (DGIdb) v3.0 (https://www.dgidb.org) [28]. We categorized each identified prescription drug using the Anatomical Therapeutic Chemical (ATC) classifications obtained from the Kyoto Encyclopedia of Genes and Genomics (KEGG: https://www.genome.jp/kegg/drug/).

Protein-protein interaction

We used STRING database v11.0 [29] to assess whether PWAS genes were enriched for direct protein-protein interactions (PPIs) with discovery and validation PWS genes for Smk and AUD as input. STRING reports the confidence level for observed PPI as follows: low confidence: <0.4; medium: 0.4–0.7; high: >0.7. We used a STRING cut-off score >0.4 to define PWS genes within the observed PPI network as having the highest degree of network connections and a whole genome reference model in STRING to determine whether the number of identified PPIs was significantly enriched.

Comparison with previous literature

Using MAGMA gene-based association analysis and FUMA functional gene mapping analysis [30], we explored whether the PWS and TWS genes for the 4 SUTs were associated in the corresponding GWAS. For each GWS independent lead SNP, we also determined whether any gene in the nearby region (within a 1-Mb window) was associated with any SUT in the original GWAS using other gene-based analyses (e.g., DEPICT, PASCAL). Lastly, we compared our findings with other literature that (a) examined differentially expressed genes/proteins in the postmortem dPFC of subjects with an SUT [31,32,33,34] (b) predicted gene/protein expression in the dPFC (via TWAS and S-PrediXcan) for the corresponding SUTs [8,9,10,11] or (c) were identified by TWAS-hub (http://twas-hub.org/) [35]. Genes/proteins not previously reported in the GWAS and PWAS/TWAS are considered “novel”.

For PWS and TWS genes that were detected in SUT GWAS and associated TWAS, we explored associations with psychiatric and other brain-related traits using the GWAS catalog (https://www.ebi.ac.uk/gwas/). To determine whether there were shared regulatory architectures underlying susceptibility, we also compared our findings with PWAS and TWAS of other psychiatric [12, 23, 24, 36,37,38] and neuroimaging traits [39].

Results

PWAS identifies brain proteins related to genetic risk for Smk and other SUTs

Using the FUSION pipeline to integrate pQTL and GWAS results to identify proteins whose abundance is correlated with genetic risk for the 4 SUTs [7,8,9,10], in the discovery stage (using the Banner dataset) we identified 13 proteome-wide significant (PWS) genes for Smk (Table 1, Fig. 2A) and 1 PWS gene each for AUD and CUD (Supplementary Table 2; Fig. 1A). No gene was PWS for OUD in the Banner dataset (Fig. 2A). Using the ROSMAP dataset, we identified 15 PWS genes for Smk (Table 1, Fig. 1B), 3 PWS genes for AUD (Supplementary Table 2, Fig. 2B), and 1 PWS gene each for CUD and OUD (Supplementary Table 2, Fig. 2B).

Fig. 2: PWAS identified 27 genes and replicated 6 genes for substance use traits.
figure 2

A Manhattan plot for smoking initiation (Smk), alcohol use disorder (AUD), cannabis use disorder (CUD) and opioid use disorder (CUD) in the discovery proteome dataset. B Manhattan plot for Smk, AUD, CUD and CUD in the validation proteome dataset. Each dot on the x-axis denotes a gene and on the y-axis the strength of association (-log10 p-value). Proteome-wide significance level for discovery dataset; Bonferroni corrected p < 4.36 × 10−5 and validation; Bonferroni corrected p < 3.39 × 10−5. Validated genes are in bold letters.

We next compared the PWS genes implicated in the discovery and validation stages for Smk, AUD, and CUD. Of the 13 high-confidence Smk PWS genes identified in the discovery dataset (Table 1, Fig. 2A), 7 were PWS (NT5C2, GMPPB, NQO1, SRR, RHOT2, ACTR1B, and BTN2A1) and 2 (BTN3A3 and WIPI2) were nominally significant in the validation dataset (Table 1). The gene identified for AUD in the discovery dataset (CTNND1) was also PWS in the validation dataset (Supplementary Table 2) and the gene (GMPPB) that was PWS in the discovery cohort for CUD was nearly PWS (p = 3.83 × 10−5) in the validation dataset (Supplementary Table 2).

In addition to the validated genes, 12 SUT genes were PWS in the ROSMAP dataset only, including 8 genes for Smk (Table 1, Fig. 2B), 2 genes for AUD, and 1 gene each for CUD and OUD (Supplementary Table 2, Fig. 2B). Of the 12 SUT risk genes identified in the validation stage, 1 was nominally significant for Smk (PRKCD, p = 3.94 × 10−2) (Tables 1) and 1 for AUD (SLC5A6, p = 7.92 × 10−4) (Supplementary Table 2) in the discovery stage. Thus, by combining 2 independent human brain proteomic and SUT GWAS datasets, we identified 27 potential SUT risk loci through their effects on brain proteomic abundance.

PWAS and TWAS overlap reveals high-confidence genes associated with SUT risk

To identify SUT PWS risk genes with evidence of transcriptional regulation, we examined the extent of overlap at the protein and transcript levels. Specifically, we performed eQTL- and sQTL-based TWAS, followed by an analysis of the overlap between the TWS and PWS genes. For the discovery TWAS, we integrated SUT GWAS summary statistics [7,8,9,10] and CMC dPFC eQTL datasets [21]. After Bonferroni correction, we detected 48 genes (38 for Smk, 6 for AUD, 2 for CUD, and 2 for OUD) whose cis-regulated expression was significantly associated with SUTs (Supplementary Table 3, Supplementary Fig. 1A–D). To validate these results, we also conducted TWAS using the GTEx frontal cortex eQTL dataset [22], which identified 50 genes (42 for Smk, 4 for AUD, 2 for CUD, and 2 for OUD) with TWS associations (Supplementary Table 3, Supplementary Fig. 2A–D).

Notably, 12 risk genes for Smk (AS3MT, C10orf32, CPSF4, SFMBT1, SRR, ITIH4, TYW5, GPX1, CCDC88B, HYAL3, CNTROB, and NAT6) from the discovery stage also showed TWS associations in the validation stage (Supplementary Table 3, Supplementary Fig. 2B). For CUD, 2 TWS genes (HYAL3 and NAT6) were validated (Supplementary Table 3, Supplementary Fig. 2B) while no TWS gene was validated for AUD or OUD (Supplementary Table 3, Supplementary Fig. 2C, D).

We next compared the validated TWS eQTL genes with PWS genes (from both the discovery and validation stages) to identify significant overlap in SUT risk genes. For Smk, 4 of 21 genes (SRR, TYW5, C10orf32, and NAT6) that were PWS were confirmed by TWAS (Table 1, Supplementary Fig. 3A) and shown to have significant overlap (binomial test p = 3.58 × 10−6). NAT6, one of the 2 PWS genes for CUD (Supplementary Table 2) was confirmed by TWAS (Supplementary Table 3). We did not run the binomial test for CUD due to the small number of genes. No PWS gene was supported by TWAS for AUD (Supplementary Fig. 3B) or OUD (Supplementary Tables 2 and 3).

At the level of splicing, we detected significant overlap between TWS sQTL genes and PWS genes for Smk (binomial test: p = 2.2 × 10−16; NT5C2, NQO1, and MCTP1) (Table 1, Supplementary Table 4). No overlapping genes were identified for AUD, CUD, or OUD.

In summary, TWAS provided substantial evidence linking Smk to expression changes in SRR, TYW5, C10orf32, and NAT6, and splicing of NT5C2, NQO1, and MCTP1.

Colocalization of PWAS and TWAS genes

We explored whether there was a causal effect on SUTs of the validated genes in the proteomic (7 for Smk and 1 for AUD) and transcriptomic (7 for Smk and 2 for CUD) analyses. Colocalization analysis showed strong causal evidence for Smk in 6 PWS genes (NT5C2, GMPPB, NQO1, RHOT2, SRR, and ACTR1B) and 6 TWS eQTL genes (AS3MT, TYW5, CCDC88B, CNTROB, SRR and C10orf32) (coloc posterior probability (PP4) ≥ 80%; Supplementary Tables 5 and 6). We also found evidence of colocalization for the validated AUD PWS gene (CTNND1) (Supplementary Table 8) and the CUD TWS (NAT6 and HYAL3) genes (PP4 ≥ 80%; Supplementary Table 9). Thus, the same risk variants may drive the associations between SUTs and both PWAS (for AUD and Smk) and TWAS eQTL (for CUD and Smk). No TWS sQTL genes were causal for Smk (Supplementary Table 7).

Conditional analysis and TWAS fine mapping

To capture independent significant signals in the PWAS and TWAS, we conducted conditional analyses using FUSION [12]. Of the 27 PWS genes, 25 were independently significant and 2 marginally significant (Supplementary Table 10). Validated proteins with strong colocalization evidence in Smk (NT5C2, GMPPB, NQO1, RHOT2, SRR, and ACTR1B) and AUD (CTNND1) remained independently associated. Conditional analyses also showed that expression changes in these 7 proteins explained most of the gene-trait signals of their loci in both the discovery and validation datasets (Supplementary Table 10).

We also observed multiple TWS genes residing within shared loci in specific genomic regions: 30 in the eQTL discovery dataset (23 for Smk, 4 for AUD, 1 for CUD and 2 for OUD), 31 in the eQTL validation dataset (25 for Smk, 3 for AUD, 1 for CUD and 2 for OUD) (Supplementary Table 11) and 28 in the sQTL dataset (Supplementary Table 12). Of the 8 validated and colocalized TWS genes (6 for Smk and 2 for CUD), 5 (4 for Smk and 1 for CUD) were independently associated (Supplementary Table 12), including 2 (SRR and TWY5) with evidence of overlap at the proteomic level (Table 1).

Using FOCUS [27] for TWAS fine-mapping, we identified 28 eQTLs and 13 sQTLs with PIP > 0.5 (Supplementary Table 13). Of these, 3 genes (SRR, CNTROB, and CCDC88B) were supported by colocalization analysis, confirming that they are likely causal for SUT risk.

Drug-gene and Protein-protein interaction

To identify potential repurposing opportunities targeting risk-related proteins, we queried all SUT risk genes (n = 27) detected by PWAS for an interaction with prescription medications via DGIdb. We observed 33 interactions involving 5 genes (SRR, PRKCD, PLD1, NT5C2, and NQO1) (Fig. 3, Supplementary Table 14). SRR, which showed significant associations with Smk in PWAS and eQTL TWAS, was prioritized as a potential target of serine and pyridoxal phosphate in the antimycobacterials and vitamins drug classes, respectively. DGIdb also prioritized NQO1, a Smk risk gene in the PWAS and sQTL TWAS analysis, as a target for 15 drug interactions, including analgesics (acetaminophen) and antiepileptics (cannabidiol).

Fig. 3: Drug-gene interaction prioritized 5 genes.
figure 3

Chord diagram of proteome-wide significant genes for SUTs and the Anatomical Therapeutic Chemical classification of drugs. Each gene is linked with drug classes and the width of each line is determined by the number of drugs in each class known to interact with each gene.

Direct PPI was identified for 2 protein pairs (C10orf32 and NT5C2, interaction score = 0.567; MAP1LC3A and WIPI2, interaction score = 0.986) (Supplementary Table 15). However, these PPIs were not significantly enriched (p = 0.33), possibly due to the small number of proteins (N = 21) included in the PPI analysis or the limited proteomic reference information in the STRING database.

Comparison with previous literature

To ascertain novelty, we compared our findings with gene-based analyses in the corresponding GWAS [7,8,9,10] and in MAGMA, as implemented by FUMA [30]. Most (16/27) of the PWS genes were identified by gene-level analysis (9/27), as independently GWS (or within a 1-Mb boundary) (1/27), or both (6/27) (Supplementary Table 16). The 11 genes uniquely identified by PWAS included 7 for Smk (SRR, GMPPB, C10orf32, RFT1, BTN3A3, WIPI2, and HEPB1), 2 for AUD (SLC5A6 and TYW5) and 2 for CUD (GMPBB and NAT6). Similarly, 50 of the 83 TWAS eQTL associations for SUTs can be detected in GWAS with 33 novel genes (Supplementary Table 16). For TWAS sQTL, 12 novel SUT associations were identified, with 35 overlapping genes in the original GWAS (Supplementary Table 16).

Compared with previous studies of predicted gene expression by smoking status [35] (TWAS FUSION), PAU [8] (S-PrediXcan), alcohol consumption [11] (S-PrediXcan), CUD [9] (S-PrediXcan) and OUD [10] (S-PrediXcan), our results overlapped with 33 genes (25 for Smk, 5 for AUD, 2 for CUD and 1 for OUD) (Supplementary Table 17). Moreover, we identified novel genes for Smk (n = 42), AUD (n = 5) and OUD (n = 3). None of the TWAS findings were reported in the previous studies of altered gene expression in individuals by current smoking status [31], or AUD [32, 33] or OUD [34] diagnoses.

To detect shared regulatory effects beyond SUTs, we examined the PWS and TWS genes for association with other psychiatric or brain-related traits from published GWAS (using the GWAS catalog), PWAS, and TWAS [12, 23, 24, 35,36,37,38]. Most of the top SUT proteins (and genes) also harbor genetic risk loci for psychiatric traits (including schizophrenia, neuroticism, depression, autism, and anxiety) and neurocognitive phenotypes (Supplementary Table 18). Among the 27 PWS genes identified in this study, 4 (ACTR1B, BTN3A3, BTN2A1 and TYW5) overlapped with schizophrenia, 2 (CTNND1 and GMPPB) with depression, and 1 each with PTSD, neuroticism, amyotrophic lateral sclerosis, attention deficit hyperactivity disorder, and bipolar disorder (Supplementary Fig. 4, Supplementary Table 19). Further, 23 TWS genes were identified in 1 or more psychiatric traits and 8 were associated with 7 neuroimaging traits (Supplementary Table 19). Thus, because SUT risk genes are commonly associated with neuropsychiatric diseases they could play a role in effecting changes in brain structure.

Discussion

We sought to characterize the effects of genetic risk loci on the proteomic architecture of 4 SUTs [7,8,9,10] using PWASs that integrated human brain pQTL data [23, 24] with GWAS results. We identified 27 SUT risk genes, of which 8 (CTNND1 for AUD and NT5C2, GMPPB, NQO1, SRR, RHOT2, ACTR1B and BTN2A1 for Smk) showed PWS associations in 2 independent brain proteomes. Notably, of the 8 validated PWS genes, evidence for 6 (CTNND1 for AUD and NT5C2, GMPPB, NQO1, SRR, and ACTR1B for Smk) was consistent with a causal effect based on colocalization analysis, with all genes reflecting independent associations. Moreover, when adjusting for the effect of protein expression on GWAS signal, SNP-trait associations were largely explained by PWAS associations, suggesting that these genes could confer SUT risk by modulating protein abundance in the brain.

Three of the potentially causal genes identified at the protein level–SRR, NT5C2, and NQO1–showed significant associations with Smk at the transcript level. TWAS also identified 12 additional genes with evidence for eQTL validation, of which 2 (C10orf32 and NAT6) showed PWS associations in 1 of the proteomic datasets for Smk and 1 for CUD (NAT6 – an eQTL in the CUD GWAS [9]) (Table 1). The large number of PWS and TWS genes for Smk likely reflects the large sample size (about quadruple that of the other traits) and associated greater statistical power and number of GWS loci in the Smk discovery GWAS [7]. Although the number of TWS genes exceeds that for PWS genes identified for Smk, only about 30% of PWS genes overlap at the transcript level. This has been seen in prior PWASs of psychiatric disorders [23, 37] and supports prior conclusions that mRNA transcript levels can explain one-third to two-thirds of the variance in steady-state protein abundance [40, 41]. Moreover, mRNA and protein abundance levels are weakly correlated [42] and have different genetic architectures [43]. Because gene expression is not a perfect proxy for protein expression [42] studying brain proteins directly can provide novel insights into the impact of genetic variation.

Our Smk PWAS prioritized SRR as an independent risk gene, with causal effects both at the proteomic and transcriptomic level. SRR encodes serine racemase, which converts L- to D-serine, a co-activator of N-methyl-D-aspartate receptors (NMDAR), which is a key component in glutamatergic synaptic signaling in the brain [44, 45]. SRR variants were linked with prognosis in a small, candidate-gene study of methamphetamine-induced psychosis [46] and in a large GWAS of schizophrenia [47]. Deletion of SRR in mice reduces the cortical level of D-serine [48] and reducing NMDAR activation [49], which is associated with a reduced ability to extinguish conditioned responses to amphetamine [50] and cocaine-associated stimuli [51]. The association between smoking initiation and reduced SRR protein and transcript expression seen here could, through a shared risk variant, mediate the adaptive processes involved in smoking initiation by altering NMDAR-dependent neurotransmission.

Drug-gene interaction results identified pyridoxal phosphate (PLP), in the vitamins drug class, as a cofactor for SRR, highlighting preclinical evidence that the human SRR is PLP-dependent [52,53,54]. As the metabolically active form of vitamin B6, PLP binds to SRR and stimulates NMDAR signaling, which is involved in brain metabolism and cellular antioxidant defense [52, 53, 55]. Although a link between cigarette smoking and reduced vitamin B6 and PLP levels has been previously documented [56,57,58], a mechanistic role for SRR in this association is not established. Functional studies that investigate SRR as a druggable gene target for PLP enzyme activity following smoking exposure could elucidate mechanistic pathways and enhance the development of novel treatments for smoking.

This study should be interpreted in the context of limitations. First, the small sample from which the brain proteome reference dataset was derived contributed to an imbalance between pQTLs and eQTLs/sQTLs, thus limiting our ability to capture the full spectrum of genetic effects on the proteome and transcriptome. This is reflected in the disparity between PWAS and TWAS results, with larger samples in TWAS (CMC eQTL = 48/5419, GTEx eQTL = 50/3106, and CMC sQTL = 48/7771) providing greater statistical power to detect gene expression than in the PWAS (Banner = 15/1139; ROSMAP = 20/1475). Future PWAS of SUTs will require larger brain proteome datasets to permit better pQTL detection. Second, although we identified PWS and TWS genes and highlighted their plausible mechanistic pathways, we could not pinpoint the causal variants associated with these genes. Efforts are needed to map the causal variants and functionally validate the described pathways in the context of SUTs. Thirdly, only EUR were included in the study due to the lack of available data for analyses in other population groups, which limits the generalizability of our findings. Future PWAS of SUTs that include diverse population samples will require availability of greater reference data. Fourth, our power varied greatly by trait, with Smk being by far the best powered trait; the results reflect this disparity.

In conclusion, using PWAS, we identified 6 high-confidence genes that modulate brain protein abundance, potentially altering biological pathways linked to the pathogenesis of SUTs. These genes, as potentially modifiable targets for the development of medications and biomarkers for SUTs, warrant further investigation. Nonetheless, the approach applied here has considerable potential to advance precision medicine efforts in diagnosing and treating SUTs.