Introduction

Medical illness and psychiatric disorders, including substance use disorders (SUDs), frequently co-occur. Individuals with chronic medical conditions are more likely to have a co-occurring SUD or psychiatric diagnosis [1,2,3,4,5] and over 9 million U.S. adults have a psychiatric disorder that co-occurs with an SUD [6]. The development of a comorbid disorder can exacerbate pre-existing conditions [7, 8] and worsen an individualā€™s prognosis [9, 10]. Moreover, co-occurring disorders can limit treatment options [11] and adversely affect treatment outcomes by reducing treatment adherence or decreasing its effectiveness [12,13,14]. Understanding the genetic underpinnings of comorbid disorders could improve their diagnosis, treatment, and ongoing management, thus informing precision medicine efforts.

Genetic liability for medical and psychiatric disorders has been discovered using genome-wide association studies (GWAS), which identify associations between common genetic variants and the trait of interest. These studies have identified pleiotropic variants, i.e., those associated with multiple conditions. GWAS findings have also demonstrated significant genetic correlations between SUDs and other psychiatric disorders [15, 16] and medical conditions [17]. These findings contribute to a growing body of evidence that shared genetic risk loci or common biological pathways may underlie co-occurring conditions.

Polygenic scores (PGS) provide a measure of an individualā€™s genetic risk for specific traits and as such are a complementary method to investigate genetic overlap. Previous studies have shown that PGS are associated with conditions such as cardiovascular disease [18], kidney disease [19], opioid use disorder [20], depression [21], and pain [22], among many others. PGS may also be used in phenome-wide association studies (PheWAS) [23] to provide insight into the pleiotropic nature of genetic liability for disorders [24, 25]. PheWAS, which have been commonly implemented using electronic health record (EHR) databases, measure the association between a PGS for a disorder by testing it against multiple phenotypes in a hypothesis-free paradigm.

Here, we used the Yale-Penn sampleā€”which comprises a diverse sample of participants recruited for genetic studies of cocaine, opioid, and alcohol dependenceā€”to conduct PheWAS of psychiatric and somatic PGS. Yale-Penn participants completed the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA) which queries medical, psychosocial, and substance use history and diagnoses, psychiatric diagnoses, and demographics [26, 27]. Previous studies have utilized the Yale-Penn sample to conduct gene x environment studies [28], linkage and association studies of substance use and dependence [29,30,31,32,33,34,35,36], and to examine phenotypic associations [37]. These studies have shown shared genetic liability across SUDs, psychiatric disorders, and environmental traits.

Using the Yale-Penn sample, we created a simplified PheWAS dataset for genetic analysis and calculated PGS to examine pleiotropy for four major substance-related traits: alcohol use disorder, opioid use disorder, smoking initiation, and lifetime cannabis use [38]. PheWAS analyses in European-ancestry participants identified significant associations between SUD PGS and substance and psychiatric diagnoses and demographic and environmental phenotypes. Here, we extend this work by examining the associations of PGS for a variety of psychiatric disorders and somatic traits in the Yale-Penn sample.

Methods

Participants and procedures

The Yale-Penn sample (Nā€‰=ā€‰14,040) was recruited from five U.S. academic sites for studies of the genetics of cocaine, alcohol, and opioid use disorders. The institutional review boards at University of Connecticut Health, Medical University of South Carolina, McLean Hospital, University of Pennsylvania, and Yale University approved the study protocol and informed consent forms. After they gave informed consent, all participants were administered the SSADDA and provided a blood or saliva sample for genotyping. The SSADDA comprises 24 modules that assess demographic information, environmental variables, medical history, and psychiatric and substance use history and diagnoses [26]. Additional information on variable selection and cleaning has been published [38]. In brief, the SSADDA yields over 3700 variables, which we refined to 691 variables for use in PheWAS by selecting variables that were considered informative for genetic studies and nonduplicative [38]. These variables are grouped into 25 categories: Demographics, Medical History, Substance Use (Tobacco, Alcohol, Cocaine, Opiate, Marijuana, Sedatives, Stimulants, Other drugs), Psychiatric (Major Depression, Conduct Disorder, Antisocial Personality Disorder [ASPD], Attention Deficit Hyperactivity Disorder [ADHD], Suicidality, Post-Traumatic Stress Disorder [PTSD], Generalized Anxiety Disorder [GAD], Panic Disorder, Social Phobia, Mania, Agoraphobia, Obsessive Compulsive Disorder [OCD], Schizophrenia, and Gambling) and Environment.

Case and control definitions

Participants who endorsed Diagnostic and Statistical Manual (DSM) criteria for a given lifetime disorder (DSM-IV for psychiatric disorders, DSM-IV and DSM-5 for SUDs) were coded as cases and those who met no diagnostic criteria were considered controls. Participants meeting a sub-threshold number of criteria (e.g., one criterion when multiple are required for diagnosis) were excluded from analyses for that disorder. For individual symptoms (e.g., suicide attempt), participants who responded affirmatively were considered cases and those who did not were considered controls. When an item was not answered, participants were coded as ā€œNAā€ and not included as either a case or a control for that variable.

Genotyping and imputation

In brief, Yale-Penn participants were genotyped in three batches using Illumina microarrays at Center for Inherited Disease Research (CIDR) or the Gelernter lab at Yale and imputed using the Michigan Imputation Server [39] with the 1000 Genomes phase 3 reference panel [40]. Details on genotyping, imputation, and quality control for the genetic data have previously been reported [36, 38, 41, 42].

Ancestry-specific PGS were calculated using PRS-Continuous Shrinkage (PRS-CS) software [43] from GWAS in discovery samples for anorexia (AN) [44], autism spectrum disorder (ASD) [45], bipolar disorder (BD) [46], generalized anxiety disorder (GAD) [47], major depressive disorder (MDD) [48], obsessive compulsive disorder (OCD) [49], panic disorder (PD) [50], post-traumatic stress disorder (PTSD) [51], schizophrenia (SCZ) [52], Tourette syndrome (TS) [53], body mass index (BMI) [54], coronary artery disease (CAD) [55] and type 2 diabetes (T2D) [56] (Supplementary Table 1). All GWAS were available for European-ancestry (EUR), but only BD [57], GAD [47], MDD [21], PTSD [51], and SCZ [52] were available for African-ancestry (AFR). Discovery GWAS were selected based on their public availability and excluded the Yale-Penn sample. The global shrinkage parameter phi was learned from the data and default values were used for other parameters as described on the github page for the software (https://github.com/getian107/PRScs).

Statistical analysis

For PGS with available primary phenotypes (diagnoses for AN, ASD and TS are not available in the Yale-Penn sample, and individuals with SCZ diagnoses were excluded from recruitment), we tested for association between the PGS and the primary phenotype using logistic regression models in R, with pā€‰<ā€‰0.05 considered significant. We next conducted a series of PheWAS using logistic regression models for binary traits and linear regression models for continuous traits, adjusting for age, sex, and the top 10 principal genetic components within each ancestry. Phenotypes in which there were less than 100 cases or controls were excluded. For available phenotypes, a second PheWAS was run that covaried for the primary diagnostic phenotype. A Bonferroni correction was applied to each ancestry group to account for multiple comparisons (AFR phenotypes nā€‰=ā€‰574, pā€‰=ā€‰8.7ā€‰Ć—ā€‰10āˆ’05; EUR phenotypes nā€‰=ā€‰620, pā€‰=ā€‰8.1ā€‰Ć—ā€‰10āˆ’05). Nagelkerke R2 was calculated to quantify the variance explained by PGS only and PGS with covariates. Additionally, we also calculated a pseudo-R2 metric developed by Lee et al. [58] measured on a liability scale to avoid bias in these estimates due to sample prevalence not being equal to population prevalence.

Results

Sample

Genetic data were available for 10,275 of the 14,040 participants, the majority of whom (54.46%) were male. The sample included 4851 AFR participants (55.2% males) and 5424 EUR participants (51.1% males) whose mean ages were 41.47 (SDā€‰=ā€‰10.16) and 39.79 (SDā€‰=ā€‰12.91), respectively. Supplementary Table 2 shows demographic information by for available primary phenotypes.

Primary phenotypic associations of PGS

For the PGS with primary phenotypes available, we tested the association of PGS with each primary phenotype (Fig. 1A). In AFR participants, none of the PGS were associated with their primary phenotype. In EUR participants, PGS for three psychiatric disorders (PGSMDD, PGSPD, and PGSPTSD) and three somatic traits (PGSBMI, PGSCAD and PGST2D) were associated with their primary phenotype at a p-value of <0.05. The proportion of phenotypic variance explained by the PGS alone ranged from 0.26 to 10.10% (Nagelkerkeā€™s pseudo-R2) and 0.10 to 4.68% (liability scale R2), in line with previous estimates (Table 1).

Fig. 1: Primary and secondary associations of psychiatric and somatic PGS.
figure 1

A Effect size and 95% confidence intervals for associations between PGS and their corresponding primary phenotype, if available. Asterisks indicate p-value for significant associations: *pā€‰<ā€‰0.05, **pā€‰<ā€‰0.01, ***pā€‰<ā€‰0.001. B Number of associations within each category for the PGS with significant associations. BD Bipolar Disorder, GAD generalized anxiety disorder, MDD major depressive disorder, OCD obsessive compulsive disorder, PD panic disorder, PTSD post-traumatic stress disorder, SCZ schizophrenia, T2D diabetes, CAD coronary artery disease, BMI body mass index.

Table 1 Phenotypic variance explained by the PGS that were significantly associated with their primary phenotype in EUR.

We next examined phenotypic associations of each PGS other than the primary phenotype. In AFR participants, after Bonferroni correction, there were significant associations for PGSBD (Supplementary Table 3). No other associations were observed among AFR participants following Bonferroni correction (Supplementary Tables 4ā€“7). In EUR participants, there were significant associations for five of the psychiatric disorders (PGSMDD, PGSGAD, PGSPTSD, PGSSCZ, and PGSTS) and somatic traits (PGSBMI, PGSCAD and PGST2DM), whereas there were no significant associations for PGSBD, PGSAN, PGSASD, PGSOCD or PGSPD (Supplementary Tables 8ā€“20).

Phenome-wide analysis of psychiatric PGS

Bipolar disorder (BD)

In AFR participants, PGSBD was associated with three phenotypes in the substance use category, all related to cocaine (e.g., regularly use cocaine, ORā€‰=ā€‰1.14, CIā€‰=ā€‰1.07ā€“1.20, pā€‰=ā€‰8.6ā€‰Ć—ā€‰10āˆ’5; Fig. 1B; Supplementary Tables 3 and 21). PGSBD was not associated with any phenotypes in EUR participants (Supplementary Table 8).

Major depressive disorder (MDD)

In EUR participants, PGSMDD was associated with 220 phenotypes across 17 categories (Figs. 1B and 2; Supplementary Tables 9 and 21). Although PGSMDD was not significantly associated with the MDD diagnosis following Bonferroni correction (ORā€‰=ā€‰1.13, CIā€‰=ā€‰1.05ā€“1.21, pā€‰=ā€‰3.8ā€‰Ć—ā€‰10āˆ’3; Fig. 1A), it was significantly associated with 15 phenotypes in the depression category, most significantly the MDD criterion count (Ī²ā€‰=ā€‰0.40, CIā€‰=ā€‰0.30ā€“0.50, pā€‰=ā€‰3.0ā€‰Ć—ā€‰10āˆ’15). These phenotypes remained significantly associated when covarying for MDD diagnosis (Supplementary Fig. 1, Supplementary Table 9).

Fig. 2: Phenome-wide association results for MDD, GAD and PTSD.
figure 2

Phenotype categories are plotted along the x-axis, and ā€“1log10 p-value x direction of effect is plotted on the y-axis. Selected phenotypes passing Bonferroni correction are labeled.

PGSMDD also showed 55 significant associations with other psychiatric disorders. Notably, PGSMDD was the only PGS associated with any phenotypes in the depression, generalized anxiety (e.g., sum of physical reactions (Ī²ā€‰=ā€‰0.14, CIā€‰=ā€‰0.08ā€“0.20, pā€‰=ā€‰5.9ā€‰Ć—ā€‰10āˆ’6)), panic disorder (e.g. shortness of breath (ORā€‰=ā€‰1.29, CIā€‰=ā€‰1.18ā€“1.41, pā€‰=ā€‰4.0ā€‰Ć—ā€‰10āˆ’9)), agoraphobia (e.g., ever agoraphobic (ORā€‰=ā€‰1.16, CIā€‰=ā€‰0.08ā€“0.23, pā€‰=ā€‰5.84ā€‰Ć—ā€‰10āˆ’5)), and suicide categories (Supplementary Table 21). Five phenotypes related to suicide were significantly associated with PGSMDD, the most significant being high suicidal intent (ORā€‰=ā€‰1.37, CIā€‰=ā€‰1.27ā€“1.48, pā€‰=ā€‰6.5ā€‰Ć—ā€‰10āˆ’9). Covarying for the MDD diagnosis reduced the number of associations in the panic disorder category from 18 to 13, and the number of associations with GAD from 4 to 0. PGSMDD was the only PGS associated with the number of inpatient psychiatric treatments (Ī²ā€‰=ā€‰0.50, CIā€‰=ā€‰0.32ā€“0.68, pā€‰=ā€‰2.8ā€‰Ć—ā€‰10āˆ’8), emotional problems (ORā€‰=ā€‰1.28, CIā€‰=ā€‰1.21ā€“1.35, pā€‰=ā€‰6.0ā€‰Ć—ā€‰10āˆ’12) and history of antidepressant use (ORā€‰=ā€‰1.24, CIā€‰=ā€‰1.18ā€“1.30, pā€‰=ā€‰2.0ā€‰Ć—ā€‰10āˆ’12).

Other categories with significant associations included those with PTSD (e.g. criterion count (Ī²ā€‰=ā€‰0.26, CIā€‰=ā€‰0.20ā€“0.31, pā€‰=ā€‰2.3ā€‰Ć—ā€‰10āˆ’18)), conduct disorder (e.g., truancy, being suspended or expelled from school (ORā€‰=ā€‰1.31, CIā€‰=ā€‰1.25ā€“1.38, pā€‰=ā€‰8.5ā€‰Ć—ā€‰10āˆ’16), ASPD (e.g., impulsivity (ORā€‰=ā€‰1.17, CIā€‰=ā€‰1.11ā€“1.24, pā€‰=ā€‰8.6ā€‰Ć—ā€‰10āˆ’7), and ADHD (e.g., criterion count (Ī²ā€‰=ā€‰0.09, CIā€‰=ā€‰0.06ā€“0.12, pā€‰=ā€‰9.6ā€‰Ć—ā€‰10āˆ’10). Additionally, PGSMDD was significantly associated with demographic and environmental phenotypes, including negatively with education (Ī²ā€‰=ā€‰āˆ’0.10, CIā€‰=ā€‰āˆ’0.12ā€“0.08, pā€‰=ā€‰8.7ā€‰Ć—ā€‰10āˆ’20) and positively with childhood adversity (ORā€‰=ā€‰1.29, CIā€‰=ā€‰1.22ā€“1.35, pā€‰=ā€‰6.8ā€‰Ć—ā€‰10āˆ’13).

PGSMDD was also significantly associated with 126 substance use phenotypes, 122 of which remained significant after covarying for the MDD diagnosis. The substance use traits most significantly associated with PGSMDD in each category were the Fagerstrƶm Test for Nicotine Dependence (FTND) score (Ī²ā€‰=ā€‰0.38, CIā€‰=ā€‰0.31ā€“0.46, pā€‰=ā€‰8.7ā€‰Ć—ā€‰10āˆ’23), criterion count for DSM-5 cocaine use disorder (CocUD; Ī²ā€‰=ā€‰0.56, CIā€‰=ā€‰0.44ā€“0.69, pā€‰=ā€‰1.3ā€‰Ć—ā€‰10āˆ’18), ā€œever usedā€ opioids (ORā€‰=ā€‰1.31, CIā€‰=ā€‰1.25ā€“1.37, pā€‰=ā€‰5.1ā€‰Ć—ā€‰10āˆ’17), and DSM-IV alcohol abuse (ORā€‰=ā€‰1.25, CIā€‰=ā€‰1.18ā€“1.31, pā€‰=ā€‰2.3ā€‰Ć—ā€‰10āˆ’11). Notably, PGSMDD had the most alcohol associations of the PGS tested.

Generalized anxiety disorder (GAD)

PGSGAD was associated with 85 phenotypes in EUR participants (Figs. 1B and 2; Supplementary Tables 10 and 21). Although it was not significantly associated with the primary diagnosis of GAD (ORā€‰=ā€‰1.21, CIā€‰=ā€‰1.01ā€“1.40, pā€‰=ā€‰0.06; Fig. 1), it was the only PGS to be associated with a history of anxiolytic treatment (ORā€‰=ā€‰1.19, CIā€‰=ā€‰1.11ā€“1.26, pā€‰=ā€‰8.7ā€‰Ć—ā€‰10āˆ’6).

PGSGAD was significantly associated with four phenotypes related to other psychiatric disorders, which were hyperactivity-impulsivity (Ī²ā€‰=ā€‰0.18, CIā€‰=ā€‰0.12ā€“0.25, pā€‰=ā€‰1.2ā€‰Ć—ā€‰10āˆ’7) and inattention (Ī²ā€‰=ā€‰0.17, CIā€‰=ā€‰0.09ā€“0.25, pā€‰=ā€‰2.4ā€‰Ć—ā€‰10āˆ’5) for ADHD; truancy, being suspended or expelled from school (ORā€‰=ā€‰1.18, CIā€‰=ā€‰1.12ā€“1.24, pā€‰=ā€‰2.9ā€‰Ć—ā€‰10āˆ’7) for conduct disorder; and seeking treatment for PTSD (ORā€‰=ā€‰1.21, CIā€‰=ā€‰1.12ā€“1.31, pā€‰=ā€‰4.4ā€‰Ć—ā€‰10āˆ’5). Additionally, PGSGAD was significantly associated with non-psychiatric phenotypes, such as health rating (higher value indicates poorer health; Ī²ā€‰=ā€‰0.07, CIā€‰=ā€‰0.05ā€“0.10, pā€‰=ā€‰2.4ā€‰Ć—ā€‰10āˆ’7) and education (Ī²ā€‰=ā€‰āˆ’0.07, CIā€‰=ā€‰āˆ’0.09ā€“āˆ’0.05, pā€‰=ā€‰1.0ā€‰Ć—ā€‰10āˆ’11).

PGSGAD was significantly associated with 73 substance use phenotypes, particularly in the tobacco, cocaine, and opioid categories (e.g., FTND score (Ī²ā€‰=ā€‰0.21, CIā€‰=ā€‰0.14ā€“0.28, pā€‰=ā€‰2.5ā€‰Ć—ā€‰10āˆ’8), using more cocaine than intended (ORā€‰=ā€‰1.18, CIā€‰=ā€‰1.07ā€“1.19, pā€‰=ā€‰1.0ā€‰Ć—ā€‰10āˆ’8), DSM-IV opioid dependence (ORā€‰=ā€‰1.23, CIā€‰=ā€‰1.17ā€“1.29, pā€‰=ā€‰3.3ā€‰Ć—ā€‰10āˆ’11), and DSM-5 alcohol use disorder (AUD, ORā€‰=ā€‰1.16, CIā€‰=ā€‰1.09ā€“1.23, pā€‰=ā€‰2.1ā€‰Ć—ā€‰10āˆ’5)). Unlike other psychiatric PGS, PGSGAD also had two significant associations with marijuana use.

Covarying for the primary phenotype, half of the phenotypes associated with PGSGAD were not significant, although the association with anxiolytic treatment remained significant (Supplementary Fig. 2, Supplementary Table 10). The tobacco category had the greatest reduction in number of associations, with 8 of 11 phenotypes no longer significant when GAD diagnosis was included as a covariate.

Post-traumatic stress disorder (PTSD)

PGSPTSD was associated with a total of 90 phenotypes in EUR participants (Figs. 1B and 2; Supplementary Tables 11 and 21). PGSPTSD showed significant associations with the diagnosis of PTSD (ORā€‰=ā€‰1.20, CIā€‰=ā€‰1.11ā€“1.28, pā€‰=ā€‰3.8ā€‰Ć—ā€‰10āˆ’5) and with treatment-seeking for PTSD (ORā€‰=ā€‰1.21, CIā€‰=ā€‰1.11ā€“1.30, pā€‰=ā€‰5.8ā€‰Ć—ā€‰10āˆ’5). The only other association in the psychiatric category was with truancy, being suspended, or expelled from school in the conduct disorder category (ORā€‰=ā€‰1.18, CIā€‰=ā€‰1.11ā€“1.24, pā€‰=ā€‰3.9ā€‰Ć—ā€‰10āˆ’7). These associations were no longer significant when the PTSD diagnosis was used as a covariate in the analysis (Supplementary Fig. 3, Supplementary Table 11).

PGSPTSD was significantly associated with 77 cocaine, tobacco, and opioid use phenotypes, including DSM-IV dependence and withdrawal symptoms for all three substances. Almost half of these traits were not significant when the PTSD diagnosis was used as a covariate.

Similar to other PGS results, PGSPTSD was also significantly associated with demographic phenotypes (e.g., education (Ī²ā€‰=ā€‰āˆ’0.07, CIā€‰=ā€‰āˆ’0.09ā€“āˆ’0.05, pā€‰=ā€‰5.8ā€‰Ć—ā€‰10āˆ’11)), environment phenotypes (e.g., household members being cigarette smokers (ORā€‰=ā€‰1.18, CIā€‰=ā€‰1.11ā€“1.24, pā€‰=ā€‰2.2ā€‰Ć—ā€‰10āˆ’7)), and medical phenotypes (e.g., health rating (Ī²ā€‰=ā€‰0.08, CIā€‰=ā€‰0.05ā€“0.10, pā€‰=ā€‰8.4ā€‰Ć—ā€‰10āˆ’8)). However, the majority of these phenotypes became nonsignificant when the PTSD diagnosis was used as a covariate.

Schizophrenia (SCZ)

PGSSCZ was associated with 14 phenotypes in EUR participants (Figs. 1B and 2; Supplementary Tables 12 and 21). The only associations with non-substance use phenotypes were with truancy, being suspended or expelled from school (ORā€‰=ā€‰1.21, CIā€‰=ā€‰1.13ā€“1.28, pā€‰=ā€‰3.4ā€‰Ć—ā€‰10āˆ’7) in the conduct disorder category and a negative association with household income (Ī²ā€‰=ā€‰āˆ’0.15, CIā€‰=ā€‰āˆ’0.22ā€“āˆ’0.08, pā€‰=ā€‰4.4ā€‰Ć—ā€‰10āˆ’5) in the demographics section.

Among substance use phenotypes, PGSSCZ was significantly associated with several alcohol use (e.g., reduction in other activities (ORā€‰=ā€‰1.22, CIā€‰=ā€‰1.16ā€“1.29, pā€‰=ā€‰4.2ā€‰Ć—ā€‰10āˆ’9)) and cocaine use phenotypes, such as failure to fulfill obligations (ORā€‰=ā€‰1.16, CIā€‰=ā€‰1.09ā€“1.23, pā€‰=ā€‰1.4ā€‰Ć—ā€‰10āˆ’5).

Touretteā€™s syndrome (TS)

In EUR participants, PGSTS was associated with 1 environmental phenotype (Supplementary Tables 13 and 21), frequency of moving/relocation as a child (Ī²ā€‰=ā€‰0.18, CIā€‰=ā€‰0.09ā€“0.27, pā€‰=ā€‰7.8ā€‰Ć—ā€‰10āˆ’5).

Phenome-wide analysis of somatic PGS

Body-mass index (BMI)

In EUR participants, PGSBMI was associated with 138 phenotypes (Figs. 1B and 3; Supplementary Tables 18 and 21), which was most significant for the primary phenotype, BMI (Ī²ā€‰=ā€‰1.94, CIā€‰=ā€‰1.79ā€“2.09, pā€‰=ā€‰4.1ā€‰Ć—ā€‰10āˆ’133). Three demographic variables were significant, including education (Ī²ā€‰=ā€‰āˆ’0.11, CIā€‰=ā€‰āˆ’0.14ā€“āˆ’0.09, pā€‰=ā€‰1.2ā€‰Ć—ā€‰10āˆ’21). PGSBMI was associated with 6 medical phenotypes, including health rating (Ī²ā€‰=ā€‰0.13, CIā€‰=ā€‰0.10ā€“0.16, pā€‰=ā€‰5.9ā€‰Ć—ā€‰10āˆ’17) and diabetes (ORā€‰=ā€‰1.67, CIā€‰=ā€‰1.53ā€“1.81, pā€‰=ā€‰2.6ā€‰Ć—ā€‰10āˆ’12). The demographic phenotypes remained associated when BMI was used as a covariate, but half of the medical associations did not (Supplementary Fig. 4, Supplementary Table 18).

Fig. 3: Phenome-wide association results for BMI, CAD and T2D.
figure 3

Phenotype categories are plotted along the x-axis, and ā€“1log10 p-value x direction of effect is plotted on the y-axis. Selected phenotypes passing Bonferroni correction are labeled.

Seventeen psychiatric disorder phenotypes were associated with PGSBMI, including 10 associations with traits in the conduct disorder (e.g. truancy, being suspended or expelled from school (ORā€‰=ā€‰1.27, CIā€‰=ā€‰1.20ā€“1.34, pā€‰=ā€‰1.5ā€‰Ć—ā€‰10āˆ’11)) and ASPD (e.g. irritability/aggression (ORā€‰=ā€‰1.24, CIā€‰=ā€‰1.18ā€“1.31, pā€‰=ā€‰6.8ā€‰Ć—ā€‰10āˆ’11)) categories; hyperactivity-impulsivity (Ī²ā€‰=ā€‰0.19, CIā€‰=ā€‰0.11ā€“0.26, pā€‰=ā€‰1.3ā€‰Ć—ā€‰10āˆ’6) in the ADHD section; and six phenotypes in the PTSD section, including PTSD diagnosis (ORā€‰=ā€‰1.24, CIā€‰=ā€‰1.14ā€“1.34, pā€‰=ā€‰1.3ā€‰Ć—ā€‰10āˆ’5). When covarying for BMI, only 5 of the 17 associations remained significant.

PGSBMI was also significantly associated with 105 substance use phenotypes, including multiple SUD diagnoses. There were also numerous associations of PGSBMI with heaviness of use, withdrawal, and physiological symptoms for a variety of substances. Although PGSBMI was not associated with an AUD diagnosis, it was uniquely negatively associated with the ages of first alcohol use (Ī²ā€‰=ā€‰āˆ’0.27, CIā€‰=ā€‰āˆ’0.38ā€“ āˆ’0.17, pā€‰=ā€‰3.4ā€‰Ć—ā€‰10āˆ’7), regular use (Ī²ā€‰=ā€‰āˆ’0.28, CIā€‰=ā€‰āˆ’0.40ā€“āˆ’0.17, pā€‰=ā€‰1.6ā€‰Ć—ā€‰10āˆ’6), and first intoxication (Ī²ā€‰=ā€‰āˆ’0.25, CIā€‰=ā€‰āˆ’0.35ā€“āˆ’0.14, pā€‰=ā€‰4.6ā€‰Ć—ā€‰10āˆ’6). Although the alcohol phenotypes were no longer significant when BMI was included as a covariate, the majority of the other substance use phenotypes remained significantly associated.

PGSBMI was also associated with several environmental variables. These included exposures to substance use in childhood, such as household members being cigarette smokers (ORā€‰=ā€‰1.25, CIā€‰=ā€‰1.18ā€“1.32, pā€‰=ā€‰7.8ā€‰Ć—ā€‰10āˆ’10) and frequent drug/alcohol use in the household (ORā€‰=ā€‰1.23, CIā€‰=ā€‰1.16ā€“1.29, pā€‰=ā€‰1.0ā€‰Ć—ā€‰10āˆ’9). Lifetime trauma assessment (ORā€‰=ā€‰1.20, CIā€‰=ā€‰1.14ā€“1.24, pā€‰=ā€‰7.9ā€‰Ć—ā€‰10āˆ’9) and childhood adversity (ORā€‰=ā€‰1.23, CIā€‰=ā€‰1.15ā€“1.30, pā€‰=ā€‰3.6ā€‰Ć—ā€‰10āˆ’8) were also significantly associated with PGSBMI.

Coronary artery disease (CAD)

In EUR participants, PGSCAD was significantly associated with 13 phenotypes (Figs. 1B and 3; Supplementary Tables 19 and 21), including the primary phenotype of heart disease (ORā€‰=ā€‰1.38, CIā€‰=ā€‰1.22ā€“1.54, pā€‰=ā€‰4.7ā€‰Ć—ā€‰10āˆ’5), BMI (Ī²ā€‰=ā€‰0.30, CIā€‰=ā€‰0.16ā€“0.45, pā€‰=ā€‰4.7ā€‰Ć—ā€‰10āˆ’5) and a negative association with education (Ī²ā€‰=ā€‰āˆ’0.07, CIā€‰=ā€‰āˆ’0.09ā€“āˆ’0.04), pā€‰=ā€‰1.3ā€‰Ć—ā€‰10āˆ’9). The remaining significant associations were with 10 tobacco use phenotypes, (e.g. FTND score (Ī²ā€‰=ā€‰0.25, CIā€‰=ā€‰0.18ā€“0.33, pā€‰=ā€‰3.4ā€‰Ć—ā€‰10āˆ’11)). The majority of these associations remained significant when the primary phenotype was included as a covariate (Supplementary Fig. 5; Supplementary Table 19).

Type 2 diabetes (T2D)

PGST2D in EUR participants was significantly associated with 71 phenotypes, including diabetes (ORā€‰=ā€‰2.18, CIā€‰=ā€‰2.02ā€“2.34, pā€‰=ā€‰7.2ā€‰Ć—ā€‰10āˆ’22) (Figs. 1B and 3; Supplementary Tables 20 and 21). PGST2D was associated with seven medical and demographic phenotypes, including BMI (Ī²ā€‰=ā€‰0.58, CIā€‰=ā€‰0.41ā€“0.76, pā€‰=ā€‰3.3ā€‰Ć—ā€‰10āˆ’11) and health rating (Ī²ā€‰=ā€‰0.10, CIā€‰=ā€‰0.07ā€“0.13, pā€‰=ā€‰9.3ā€‰Ć—ā€‰10āˆ’9), five of which remained significant when the primary phenotype was included as a covariate.

Truancy, being suspended or expelled from school in the conduct disorder group was the only psychiatric phenotype associated with PGST2D (ORā€‰=ā€‰1.17, CIā€‰=ā€‰1.09ā€“1.25, pā€‰=ā€‰4.2ā€‰Ć—ā€‰10āˆ’5). PGST2D was significantly associated with 63 substance use phenotypes, including the FTND score (Ī²ā€‰=ā€‰0.29, CIā€‰=ā€‰0.20ā€“0.38, pā€‰=ā€‰3.5ā€‰Ć—ā€‰10āˆ’10), DSM-5 OUD (ORā€‰=ā€‰1.25, CIā€‰=ā€‰1.18ā€“1.32, pā€‰=ā€‰7.6ā€‰Ć—ā€‰10āˆ’10), DSM-5 CocUD (ORā€‰=ā€‰1.20, CIā€‰=ā€‰1.12ā€“1.27, pā€‰=ā€‰8.9ā€‰Ć—ā€‰10āˆ’7), and DSM-5 CanUD (ORā€‰=ā€‰1.17, CIā€‰=ā€‰1.09ā€“1.24, pā€‰=ā€‰5.9ā€‰Ć—ā€‰10āˆ’5). The majority of these remained significant when covarying for the primary phenotype (Supplementary Fig. 6, Supplementary Table 20).

Comparison between PGS phenotypic associations

Ninety-eight phenotypes were significantly associated with two or more PGS (Fig. 4). Common demographic variables across PGS included negative associations with education and income, and positive associations with BMI. Of the medical phenotypes, health rating and number of medical problems were the most common significant associations. Several environmental variables were also associated with multiple PGS, most commonly household members using cigarettes. Interestingly, PGSMDD and PGSBMI share many associated phenotypes in the environment category and in psychiatric categories, where common associations were observed for ADHD, antisocial personality disorder, conduct disorder, and PTSD phenotypes. Truancy, suspended, and expelled from school from the conduct disorder section was significant across all PGS except PGSCAD. Substance use categories (alcohol, cocaine, marijuana, opiate, and tobacco use) exhibited widespread commonalities across both psychiatric and somatic PGS. Notably, DSM-5 criterion count for cocaine was associated with four PGS; whereas sum of withdrawal problems and DSM-5 criterion count for opiate use were both associated with five of the seven PGS. Numerous tobacco use phenotypes were associated with all PGS save for PGSSCZ; whereas the majority of common significant phenotypes observed for alcohol phenotypes were between PGSMDD and PGSSCZ.

Fig. 4
figure 4

Heatmap of selected phenotypes that were common across two or more PGS.

Discussion

This study examined the performance of psychiatric and somatic PGS in the deeply-phenotyped Yale-Penn sample, in which most participants were ascertained based on having one or more lifetime SUDs. The SSADDA yields a wealth of phenotypic data not typically available in EHR-based biobanks traditionally used for this type of analysis, therefore we were able to both replicate previous findings and identify several novel cross-trait associations. For all PGS, the largest number of associations were with phenotypes in the substance use categories. This is consistent with the high prevalence of SUDs and the large number of individual traits ascertained for each substance in this sample, and highlights the high degree of pleiotropy of SUDs with both psychiatric and medical phenotypes. Also, as might be expected, compared to the somatic PGS, psychiatric disorder PGS showed more associations with phenotypes in psychiatric categories, both within and cross disorder.

Several PGS were significantly associated with their primary phenotypes. The three somatic PGS were associated with their respective primary phenotypes, indicative of the power of the PGS. PGSPTSD in EUR participants was associated with a PTSD diagnosis. However, PGS for MDD and GAD in EUR participants were not associated with their respective primary diagnosis following Bonferroni correction, though both were associated with related phenotypes, such as DSM criterion count for MDD and the use of medications to treat anxiety. The lack of association of PGS with their primary phenotypes could be due to the sampleā€™s ascertainment strategy, which focused on the presence of one or more SUDs.

Some PGS did not yield any significant associations. In AFR participants, only the PGSBD showed any significant associations and none were BD-related phenotypes. Although no other associations with PGS were significant, some of the AFR PGS showed nominal associations (i.e., pā€‰<ā€‰0.05) that may become significant with a better powered PGS derived from a larger originating GWAS (e.g., the association of PGSMDD with ā€œever depressedā€). Because the SSADDA interview does not assess autism, Touretteā€™s Syndrome, or eating disorders, primary associations for these PGS could not be tested.

PGSMDD showed the most associations of any PGS tested. Notably, it was also the only PGS to yield significant associations with depression- and suicide-related phenotypes. While other EHR-based PheWAS have demonstrated strong associations of MDD with the primary diagnosis [59, 60], our strongest association among depression phenotypes was for the MDD criterion count. Moreover, each of the individual nine MDD diagnostic criteria were also significantly associated with PGSMDD, suggesting a genetic contribution to each. Most of the associations with psychiatric phenotypes remained significant when the depression diagnosis was covaried, indicating that the associations are not due to co-occurring MDD. As with previous findings in an EHR-based PheWAS, we observed associations of PGSMDD with alcohol and tobacco use phenotypes, GAD, PTSD, and agoraphobia [59]. The association between SUDs and MDD was also found in our previous analysis in this sample, which demonstrated associations between PGS for SUDs and a number of depression phenotypes [38]. Interestingly, numerous withdrawal-related phenotypes for cocaine, tobacco, opioids, and alcohol were also significantly associated with PGSMDD, as were treatment-seeking for depression and other psychiatric disorders.

Few studies have examined the performance of anxiety-related PGS. One study in which a PGSPTSD was tested in four EHR-based biobanks [61] showed significant associations with a PTSD diagnosis, a SUD diagnosis, and tobacco dependence, as well as numerous associations with medical conditions, including circulatory and respiratory diseases. In contrast to our findings, that study showed associations with various anxiety disorders and depression, which may have been due to the large size of the included biobanks and the higher number of cases for anxiety disorders. Our PheWAS results for both PGSPTSD and PGSGAD were associated with cocaine, opioid, and tobacco diagnoses, criterion counts, and treatment seeking for use of those substances, which is suggestive of an association of greater genetic risk for anxiety with greater SUD severity.

Participants who, during screening for study participation, self-reported having a schizophrenia or bipolar disorder diagnosis were excluded from the Yale-Penn sample. Thus, the lack of associations of PGSBD with the primary diagnoses and related phenotypes was not unexpected; and we did not test for association between PGSSCZ and schizophrenia due to low sample size (nā€‰=ā€‰6). As with previous PheWAS, PGSSCZ was associated with substance use and personality disorder phenotypes [62]. Given the high rate at which SCZ and tobacco use co-occur, and previously observed association of PGS for SCZ with tobacco use [62], the lack of associations here were unexpected and may also be attributable to the exclusion of participants with psychotic disorders from the Yale-Penn sample.

In addition to all three somatic PGS being strongly associated with their primary phenotype, they were associated with BMI and several tobacco-related phenotypes. Previous studies conducted using data from the UK Biobank and Penn Medicine BioBank showed associations of PGSBMI with T2D, circulatory system disorders, and sleep problems [25, 63]. We also found associations of the PGSBMI with numerous substance-related phenotypes and environmental factors. Lifetime trauma assessment, childhood adversity, and childhood exposure to substance use were also significantly associated with PGSBMI, experiences that have been shown to predict higher BMI [64]. PGST2D, as expected, was associated with measures of poor health and numerous substance use phenotypes, the majority of which persisted after controlling for a diabetes diagnosis. Higher rates of SUDs have been observed in individuals with T2D [65] and individuals with a SUD and T2D experience poorer medical outcomes and higher mortality than those with T2D alone [66], though little is currently known about pleiotropy of these traits. Akin to previous work [67], PGSCAD was associated with tobacco use phenotypes but no other substance use, such as alcohol phenotypes, or medical disorders, such as T2D. PGSCAD did not yield any associations in the psychiatric category and PGST2D only had one, which was no longer significant covarying for diabetes diagnosis, suggesting that genetic liability for these medical disorders is not associated with psychiatric phenotypes in this sample.

This study should be interpreted in light of the strengths and limitations. The Yale-Penn dataset used as a target sample is comparatively small and cross-sectional, without longitudinal data and medical records data available in large, EHR-based genetic studies. However, the in-depth SSADDA interview provides granular psychiatric and substance use data not available in EHR-based biobanks, which provide the possibility of novel insights into the pleiotropy of co-occurring traits. The Yale-Penn sample excluded individuals with certain psychiatric illnesses, including self-reported diagnosis of schizophrenia or bipolar disorder at the time of telephone screening, thus limiting our ability to observe some associations. For the primary phenotypes that we did test, PGS for psychiatric disorders explains only a small proportion of phenotypic variance (<1.4%), although PGS for somatic traits explains a higher proportion (up to 10% for BMI). Available discovery GWAS varied in size and those that included individuals of AFR ancestry GWAS were not available for all the phenotypes of interest. Moreover, the number of participants in the originating AFR GWAS were consistently much smaller than those available for EUR. Because the Yale-Penn sample includes similar numbers of AFR and EUR participants, we believe that larger discovery GWAS in AFR participants and the accompanying increase in statistical power will be more informative of pleiotropy in non-EUR populations.

Despite these limitations, our findings demonstrate the pleiotropic nature of genetic liability for psychiatric disorders and somatic traits. Both psychiatric and somatic PGS were broadly associated with substance use phenotypes in a sample enriched for individuals with SUDs. Despite the extensive pleiotropy found, we also identified associations that were unique to specific PGS. Furthermore, psychiatric PGS were more likely to be associated with psychiatric disorders compared to somatic PGS, suggesting some level of specificity of genetic architecture within categories. Many phenotypes remained associated when covarying for the primary phenotype on which the PGS was based, suggesting that the genetic liability for the disorders in question is the primary driver of the associations. Overall, we find evidence that genetic liability for psychiatric disorders and somatic traits partially underlies the common co-occurrence of these traits with SUDs.