Genome-wide association study of psychiatric and substance use comorbidity in Mexican individuals

The combination of substance use and psychiatric disorders is one of the most common comorbidities. The objective of this study was to perform a genome-wide association study of this comorbidity (Com), substance use alone (Subs), and psychiatric symptomatology alone (Psych) in the Mexican population. The study included 3914 individuals of Mexican descent. Genotyping was carried out using the PsychArray microarray and genome-wide correlations were calculated. Genome-wide associations were analyzed using multiple logistic models, polygenic risk scores (PRSs) were evaluated using multinomial models, and vertical pleiotropy was evaluated by generalized summary-data-based Mendelian randomization. Brain DNA methylation quantitative loci (brain meQTL) were also evaluated in the prefrontal cortex. Genome-wide correlation and vertical pleiotropy were found between all traits. No genome-wide association signals were found, but 64 single-nucleotide polymorphism (SNPs) reached nominal associations (p < 5.00e−05). The SNPs associated with each trait were independent, and the individuals with high PRSs had a higher prevalence of tobacco and alcohol use. In the multinomial models all of the PRSs (Subs-PRS, Com-PRS, and Psych-PRS) were associated with all of the traits. Brain meQTL of the Subs-associated SNPs had an effect on the genes enriched in insulin signaling pathway, and that of the Psych-associated SNPs had an effect on the Fc gamma receptor phagocytosis pathway.

www.nature.com/scientificreports/ All participants provided written informed consent or assent, and the protocol complied with international norms and the Helsinki Declaration. The protocols for the MxGDAR/Encodat were reviewed and approved by the Research Ethics Committee of the Instituto Nacional de Psiquiatría (No. CEI/C/083/2015) and the Instituto Nacional de Medicina Genómica (No. 01/2017/I); the protocol for the MeDaCrosR was reviewed and approved by the Research Ethics Committee of the Instituto Nacional de Medicina Genómica (Nos. 23/2015/I and 06/2018/I).
Microarray analysis. DNA was extracted by mouth swab or from blood cells using a modified salting-out method with the Puregen commercial kit (Quiagen, USA). The quality and integrity of the DNA was analyzed with a NanoDrop spectrophotometer (Thermofisher, USA) with 2% agarose gel. Genotyping was performed with the commercial microarray Infinium PsychArray (Illumina, USA). Fluorescence intensities were read with iScan (Illumina, USA) and converted to genotypes with GenomeStudio software (Illumina, USA). Genotyping of the MxGDAR and the MeDaCrosR was carried out in the high-technology microarray unit of the Instituto Nacional de Medicina Genómica, using the same protocol.
Quality control of single nucleotide polymorphisms (SNPs). Quality control was performed with Plink software 28 . SNPs were excluded if they had a variant calling greater than 95%, a minor allele frequency (MAF) greater than 5%, a Hardy-Weinberg equilibrium chi-square test p-value greater than 1e−5, and variants A/T or G/C (to avoid the flip strand effect). Individuals with a genotype call rate of less than 95% were excluded. To correct for cryptic relationships, all individual pairs with an identity-by-state value greater than 1.6 were marked, and the individual with the lowest genotype call rate was excluded 29 . Statistical analysis. Estimation of variance and correlation at the genome-wide level. The variance explained by the SNPs was calculated using a restricted maximum-likelihood analysis (GREML) 30,31 with GCTA software 32 . Three models of explained variance were proposed, comparing: (1) Com with Cont, (2) Psych with Cont, and (3) Subs with Cont. A bivariate GREML analysis was also performed to evaluate genome-wide correlation between Com and Subs, Com and Psych, and Subs and Psych. All of the models were adjusted for age, sex, and ten principal components of global ancestry.
Estimation of global ancestry. Estimation of global ancestry was performed with a principal components analysis using previously reported algorithms 33 and the PC-AiR package 34 . The Human Genome Diversity Project (HGDP) 35 was used as a reference for population-level genotypes. Only independent SNPs were included in this estimation; for this reason a process of linkage disequilibrium pruning (LD pruning) was carried out, using the following parameters: a window size of 50 kb, a step of 2, and a variance inflation factor of 5.
Genome-wide genotype-phenotype associations. The genetic associations were carried out by means of multiple logistic regressions, adjusted for age, sex, and ten components of global ancestry as covariables. The logistic regressions considered (1) the Com group (MxGDAR, n = 423, MeDaCrosR, n = 181), a total of 604 cases; (2) the Subs group (MxGDAR = 1165); (3) the Psych group (MxGDAR = 318, MeDaCrosR = 340), a total of 658 cases; and (4) the Cont group (n = 3310). Three statistical contrasts were performed: (a) Subs and Cont, (b) Psych and Cont; and (c) Com and Cont. A p-value of 5.00e−05 was considered nominally associated and a value of 5.00e−08 was considered statistically significant on the genome-wide level. The logistic regressions were carried out in Plink 28 . After statistical contrasts, all of the variants were removed that had a MAF lower than 5.0% in cases or controls. An in silico functional annotation was also carried out of the associated SNPs using Variant Effect Predictor (VEP) 36 . In addition, the allelic frequencies of the associated variants were compared with the Genome Aggregation Database (GnomAD) 37 . A search in the GWAS atlas of the associated SNPs was performed for associations with brain-related phenotypes (psychiatric, substance use, and neurological disorders). Pathway analysis was carried out with the online ComPath tool 38 , using the KEGG (Kyoto Encyclopedia of Genes and Genomes) 39,40 and Reactome databases. A clustering of paths was also performed 41 .
Polygenic risk scores (PRSs). Polygenic risk scores (PRSs) were calculated with SNPs associated with a significance threshold of p < 5.00e−05 for each statistical contrast, using Plink statistical software. Three PRSs were constructed based on the following statistical contrasts: (1) Subs and Cont (Subs-PRS), (2) Psych and Cont (Psych-PRS), and (3) Com and Cont (Com-PRS). The PRS is the sum of an individual's genetic variation adjusted by the weight of every genetic variant associated with a specific p-value threshold 21 . Next, the standardized residual of the PRS was regressed on the ten principal components of global ancestry. Based on the distributions of these PRSs, the individuals were then divided into the following groups: (1) high PRS, where Com-PRS, Subs-PRS, or Psych-PRS was higher than the third quartile; (2) medium PRS, where any PRS fell between the first and third quartiles; and (3) low PRS, where any PRS was lower than the first quartile. The selected PRS was compared among the three groups (High, Medium, and Low) with an ANOVA. Next, the prevalence of psychiatric symptoms (psychosis, depression, anxiety, post-traumatic stress, obsession/compulsion, and hypo/mania) and substance use symptoms (ever used drugs, alcohol or tobacco use) was compared among the three groups with a chi-square test. These statistical comparisons were carried out with R 42 , and significance was established as p < 0.05. These analyses considered only the individuals from the MxGDAR, because these had a more indepth phenotyping. Next, the accuracy of the PRSs to predict the phenotypes (Com, Subs, Psych, and Cont) was calculated with a multinomial regression using the nnet package. The sample was randomly divided into a training sample (70% of the individuals) and a test sample (30% of the individuals). A multinomial regression of the phenotypes in the training sample was performed on the three PRSs constructed, and the phenotype was Vertical pleiotropy. To explore whether the associated variants could perform vertical pleiotropy in the phenotypes with one exposure to an outcome, the following models were tested: (a) Subs exposure to the Com outcome, (b) Psych exposure to the Com outcome, (c) Subs exposure to the Psych outcome, (d) Psych exposure to the Subs outcome, (e) Com exposure to the Subs outcome, and (f) Com exposure to the Psych outcome. The vertical pleiotropy was tested using generalized summary-data-based Mendelian randomization (GSMR) 47 , with GCTA software. These analyses used SNPs in the GWAS at p < 5.0e−05 for each genome-wide contrast (Subs and Cont, Psych and Cont, and Com and Cont).
Genome-wide associations. Genome-wide association in the Subs group. The genome-wide association analysis found no genome-wide statistically significant association between Subs and Cont; however, seven SNPs reached nominal association (p < 5.00e−05) (Table 2a, Supplementary Table S1). Of these, three were located on intergenic regions and four on intronic regions of the LINC01622, LMO7, B3GNTL1, and C21orf91 genes. No enriched pathways were found. The only SNP reported in the GWAS atlas associated with a brain-associated phenotype was rs1304319, associated with being a morning person.
Genome-wide association in the Psych group. The comparison between Psych and Cont found a total of 40 SNPs associated at a nominal statistical level (Table 2b, Supplementary Table S2). Of these, 26 were intergenic, 13 were intronic, and one was a missense polymorphism. Associated SNPs were found on five long non-coding genes (LINC01741, LINC00578, LOC102723944, LOC105378485, and LOC105378486), eight in protein-coding regions (DPP10, PCOLCE2, SCAMP1, UBN2, NBEA, GPC6, ARHGAP23, and PTPRM); the missense polymorphism was a change of histidine for aspartic acid in the 241 position of the RGLP4 (p.His241Asp). No enriched pathways were found grouping these genes. The following SNPs were reported in the GWAS atlas as associated at a nominal genome-wide significance (p < 5.00e−05) to psychiatric or substance use phenotypes in other populations: rs12139234, rs10208402, rs7634577, rs13122202, rs10885147, rs4269860, rs4590798, rs3107346, rs10885243, rs1964449, rs589258, and rs608624.
The SNPs rs12139234 and rs1964449 have been associated with schizophrenia/bipolar disorder and depression, respectively, while rs10885147, rs4590798, rs589258, and rs608624 have been associated with alcohol consumption on a typical day, starting age of smoking, and frequency of memory loss due to alcohol consumption in the previous year. The following SNPs have been associated with other brain-associated phenotypes: rs10208402 (chronotype), rs10885243 (insomnia and intelligence), rs13132202 (educational attainment), rs4269860 (right superior frontal diusivities), and rs7634577 (superior corona radiata radial diusivities).
Genome-wide association in the Com group. The comparison between Com and Cont found 17 SNPs associated at a nominally significant level (Table 2c, Supplementary Table S3). Of the 17 SNPs, seven were intergenic, nine were intronic, and one was a missense polymorphism. The nine intronic variants were located in the proteincoding genes LRRTM4, VGLL4, MGMT, LUZP2, PWP1, and FARP1, and in the LINC02067 long non-coding gene. The missense polymorphism was a change of glutamine for glutamic acid in the 215 position of the SHPK (p.Glu215Gln). No enriched pathways were found grouping these genes.
The GWAS atlas search found five SNPs (rs12747494, rs953855, rs13033902, rs971515, and rs12702917) previously associated with psychiatric or substance use phenotypes in other populations. The SNPs rs12747494, rs971515, and rs12702917 were associated with past tobacco smoking, weekly alcohol consumption, and frequency of failure in the past year to fulfill normal expectations due to drinking, while rs953855 and rs13033902 were associated with ease of getting up in the morning and being a morning person.   Table S1). The three PRSs had an accuracy of 40.69% (CI 95% 37.81-43.62%) for the prediction of Com, Subs, and Psych. Next, the individuals were grouped based on PRS, and the differences in psychiatric symptoms and substance use patterns in individuals with different PRSs were evaluated (Table 3); these analyses included only individuals from the MxGDAR. There were statistically significant differences between individuals with different PRSs in lifetime prevalence of anxiety, lifetime smoking of at least 100 cigarettes, and problematic use of alcohol (excessive alcohol consumption, possible abuse or dependence on alcohol in the previous year, or having stopped drinking because of problems with its use).
Brain meQTL analysis of associated variants. The brain meQTLs associated with Subs, rs4787483, rs1304319, rs2824496, and rs9901757, had an effect on 269 CpG sites, 48 of which were annotated to the gene promoter (Table 4a). The greatest association was for rs2824496 affecting cg21278102, annotated to the promoter of HSBP1 (Table 4a). The genes affected by the brain meQTL associated with Subs were enriched for the insulin signaling pathway (hsa04910, adjusted p = 0.0225) and the mTOR signaling pathway (hsa04150, adjusted p = 0.0269). Of the 40 SNPs associated with Psych, 19 (rs10208402, rs10885147, rs10885243, rs1098777, rs11527950, rs12140710, rs13074870, rs1923630, rs2049156, rs2317965, rs4269860, rs4590798, rs4652252, rs589258, rs608624, rs6789946, rs762416, rs7701154, and rs905488) had a cis or trans effect on 643 CpGs sites, of which 61 were annotated to the gene promoter, with the greatest effect for the rs13074870 associated with cg26017930, annotated to SKI (Table 4b). The brain meQTLs associated with Psych were enriched for 50 pathways, including Axon guidance (hsa04360, adjusted p < 1.00e−4) and Fc gamma R-mediated phagocytosis (hsa04666, adjusted p = 1.00e−04). Of the 17 SNPs associated with the Com, seven (rs12747494, rs1039201, rs4470979, rs7078706, rs7118149, rs953855, and rs971515) had an effect on 91 CpGs sites, of which 25 were associated with the gene promoter. In these associations the greatest effects were seen by the rs12747494 affecting cg25100604 and cg275519869, CpGs sites annotated to the promoters of PCBP1 and FAM133B, respectively (Table 4c). There were no enriched pathways for the genes where the brain meQTL associated with Com had an effect.

Discussion
The comorbidity between psychiatric symptomatology and substance use leads to a significant impairment in affected individuals, but the evaluation of the phenotype has been little explored in genome-wide studies. Ours is one of the first genome-wide association studies to explore variants associated with this comorbidity, along with substance use and psychiatric symptoms alone, in the Mexican population.
Our study found that the evaluation of different phenotypes (Subs, Psych, and Com) could identify different patterns of associated variants, but that these associated variants are highly correlated and could also have pleiotropy, the effect of a single variant on different phenotypes or traits 48 . Genome-wide association studies have found a high degree of correlation between psychiatric disorders 49 , possibly because many of the associated genes could have pleiotropic effects between different psychiatric phenotypes. In this study we found vertical pleiotropy and genome-wide correlation of all phenotypes, suggesting that the associated variants could have an effect in Subs, Psych, and Com. These have been reported recently in a genome-wide study of lifelong cannabis use, which found a genetic correlation of the phenotype with different substance use disorders as well as with other mental disorders, meaning that many of the associated genes could show a pleiotropic effect in these phenotypes 15 . Other studies have suggested that the categorization of individuals into discrete diagnoses may neglect the consideration of these individuals in terms of a broader phenotype, which may be occurring in comorbidity studies [50][51][52][53][54] . The use of genetic analysis could help us to better categorize individuals with comorbidity, as in our finding that www.nature.com/scientificreports/ individuals with higher polygenic risk scores had a higher prevalence of having smoked more than 100 cigarettes in their life and a higher prevalence of risky alcohol use. Interestingly, the variants found in the PRS analysis to be associated with Com and Psych have been associated in the GWAS atlas with alcohol and tobacco phenotypes. This may be the result of the larger sample size in our study of individuals using alcohol or tobacco. We thus believe that genome-wide association studies of comorbidity might include a greater diversity of substance use disorders in order to explore this phenotype-dependent difference in GWAS signals. If so, PRS results could be used in a clinical setting to screen for individuals with a greater risk of developing the comorbidity. In order to assess the possible effects of the associated SNPs on the brain, we performed brain meQTL analysis of a previously published database of individuals of Mexican ancestry [43][44][45] . Some polymorphisms showed greater evidence of association with brain phenotypes, including rs10208402, rs12747494, and rs953855. The first, the intron SNP in the gene DPP10, for which we found an association with psychiatric disorders, has been associated with chronotype [55][56][57] . This variant exerted a trans effect on a CpG site on gene HS3ST2, also associated with chronotype 55 . We also found an association of rs953855, the SNP in the intron of LRRTM4, with the Com; it too has been associated with chronotype. A recent gene-level analysis found an association of the LRRTM4 loci with lifetime cannabis use 15 . These results suggest that the comorbidity could be associated with chronotype. The rs953855 in this study was a brain meQTL, associated in trans to a CpG site in the ELF2 gene, which has been suggested as a sensor for the elevation of extrasynaptic glutamate, modifying the growth of functional dendritic spines 58 . Glutamatergic signaling is a regulator in the reward regions of the brain that maintain the habits of psychoactive substance use 59 . The use of such substances alters this signaling; the mechanisms depend on the substance used, but the great majority promote an increase in glutamate at the synaptic level, which leads to an increase in the activation of neuronal receptors and a glutamate-dependent excitotoxicity, a mechanism dependent on Ca 2+60-64 . The increase in excitotoxicity could generate changes in neuroplasticity, leading to an increase in drug-seeking behaviors and in the memories associated with drugs 65 . The intergenic SNP, rs127474, for which we found an association with the comorbidity, is also associated with ever/never used tobacco 57 . This SNP had a trans effect on two CpG sites located in the promoters on PCBP1 and FAM133B. PCBP1 is part of the DISC1 interactome 66 , which is essential in the development of brain cells, and alterations in this area could lead to neurodegenerative disorders [66][67][68] . Peripheral levels of DISC1 have been proposed as a marker of nicotine exposure 69 , supporting the possibility of an association between rs127474 and tobacco-related behavior.
On the pathway/functional level, we found that brain meQTLs associated with Subs and Psych have a greater effect on brain pathways than those associated with Com. Those associated with Subs modified the insulin signaling pathway, while those associated with Psych modified the Fc gamma receptor mediated phagocytosis. Insulin signaling in the periphery plays a crucial role in the homeostasis of plasma glucose levels, but the effect of insulin on the central nervous system is less recognized. Insulin in the CNS is involved in cell survival, neurogenesis, receptor trafficking, and neurotransmitter release/reuptake [70][71][72][73] . The insulin pathway has been associated with substance addiction in animal and human studies, and in integrative bioinformatics analysis of different omic data 74 . The mechanism underlying the effect of insulin in substance use is not fully elucidated, but evidence points to a dysregulation of dopamine in brain reward circuits [75][76][77][78][79] . The Fc gamma receptor mediated phagocytosis pathway, for which we found brain meQTLs associated with psychiatric symptoms, was recently associated with schizophrenia and bipolar disorder through analysis of transcriptomes and co-localization of GWAS signals 80 . Zhao et al. have suggested that the alteration of the Fc gamma receptor pathway could affect lysosomal function, and found that individuals with lysosomal dysfunction had greater manifestations of psychiatric symptoms 81 , suggesting its possible importance in the manifestation of this symptomatology.
The genome-wide estimated variance in the Com group was greater than in those who presented only one disorder, suggesting that the risk from genetic variability could be greater in this group. Interestingly, the explained variance in the Subs group was close to zero. This lesser explained variance could mean that the effects of the common variants analyzed in the microarray are not sufficient to capture the genetic effect in this group, possibly because of an underestimation, possibly because of the effects of uncommon variants not explored in this study. An examination of these phenotypes with other sources of genetic variation is thus needed, and of the way in which this genetic risk could interact with environmental factors such as exposure to trauma and social tolerance of substance use in producing the comorbidity.
Although this study identified associations that could be the basis for future functional studies of the relationship between genetic variability and comorbidity, some limitations should be noted. The main one is the use of two samples, one a population sample and the other clinical, where the criteria for defining the phenotype could be a source of heterogeneity. However, the inclusion of both populations increases the sample size and facilitates the identification of associations. Even with these limitations, we believe that this study, with the sample size it offers for investigation of the genome-wide association, provides important information for the understanding of the comorbidity between psychiatric symptomatology and substance use in the Mexican population.
This study found genetic associations of SNPs that modulate brain DNA methylation levels in genes involved in the insulin signaling pathway and Fc gamma receptor phagocytosis with the comorbidity between psychiatric symptomatology and substance use in the Mexican population. These results suggest new paradigms for understanding how genetic variability regulates comorbidity.