Analysis of whole genome-transcriptomic organization in brain to identify genes associated with alcoholism

Alcohol exposure triggers changes in gene expression and biological pathways in human brain. We explored alterations in gene expression in the Pre-Frontal Cortex (PFC) of 65 alcoholics and 73 controls of European descent, and identified 129 genes that showed altered expression (FDR < 0.05) in subjects with alcohol dependence. Differentially expressed genes were enriched for pathways related to interferon signaling and Growth Arrest and DNA Damage-inducible 45 (GADD45) signaling. A coexpression module (thistle2) identified by weighted gene co-expression network analysis (WGCNA) was significantly correlated with alcohol dependence, alcohol consumption, and AUDIT scores. Genes in the thistle2 module were enriched with genes related to calcium signaling pathways and showed significant downregulation of these pathways, as well as enrichment for biological processes related to nicotine response and opioid signaling. A second module (brown4) showed significant upregulation of pathways related to immune signaling. Expression quantitative trait loci (eQTLs) for genes in the brown4 module were also enriched for genetic associations with alcohol dependence and alcohol consumption in large genome-wide studies included in the Psychiatric Genetic Consortium and the UK Biobank’s alcohol consumption dataset. By leveraging multi-omics data, this transcriptome analysis has identified genes and biological pathways that could provide insight for identifying therapeutic targets for alcohol dependence.


INTRODUCTION
Alcohol dependence (AD) can be defined as a cluster of physiological, behavioral, and cognitive phenomena in which the use of alcohol takes a much higher priority for a given individual than other behaviors that once had greater value (American Psychiatric Association 1994) 1 . The development of AD is characterized by frequent episodes of intoxication, preoccupation with alcohol, use of alcohol despite adverse consequences, compulsion to seek and consume alcohol, loss of control in limiting alcohol intake, and emergence of a negative emotional state in the absence of the drug (American Psychiatric Association 1994) 1

. The changes in behavioral
priorities not only results in increased morbidity and mortality, it is also a substantial social and economic burden on individual families and the nation 2 .
In individuals with alcohol dependence, there is a complex interplay between genetic background, environmental factors, and history of alcohol exposure 3 . Alcohol crosses the blood brain barrier and triggers changes in the central nervous system 4 , including transcriptional changes in many different regions of the brain [5][6][7][8][9] . The transcriptional effects of long-term alcohol consumption are associated with pathways involved in the neuro-immune system, neurotoxicity, and changes in neuroplasticity 6,7,9 . Transcriptomes from complex tissues, such as human brain, may be organized into networks of co-expressed genes that better reflect the biological functions and organization of the tissue [7][8][9][10][11][12][13][14] . Application of bioinformatics techniques, such as weighted gene co-expression network analysis (WGCNA) 15 , has uncovered networks associated with alcohol dependence 8,9 . However, past studies were performed on small numbers of AD cases, thus limiting the statistical power to detect small changes in alcohol-induced gene expression. In this study, we utilized massively parallel sequencing of RNA transcripts from postmortem human prefrontal cortex (PFC) of 65 alcoholics and 73 controls of European descent to explore transcriptional networks and genetic variation and identified groups of coexpressed genes associated with alcohol dependence. Our analysis provides systems-level evidence of genetic networks within the PFC that contribute to the pathophysiology of alcohol drinking behavior in humans.

Case selection and postmortem tissue collection
Human autopsy brain samples were obtained from the New South Wales Tissue Resource Centre at the University of Sydney (http://sydney.edu.au/medicine/pathology/btrc/). Fresh frozen samples of the superior frontal gyrus (Brodmann area 8; referred to as prefrontal cortex (PFC) in this manuscript) were collected from each postmortem sample. Brain tissue was sectioned at 3 mm intervals in the coronal plane. Alcohol dependent diagnoses were confirmed by physician interviews, review of hospital medical records, questionnaires to next-of-kin, and from pathology, radiology, and neuropsychology reports. Tissue samples were matched as closely as possible according to age, sex, post-mortem interval, pH of tissue, disease classification, and cause of death. To be included as part of the alcohol-dependent cohort, subjects had to meet the following criteria: greater than 18 years of age, no head injury at time of death, lack of developmental disorder, no recent cerebral stroke, no history of other psychiatric or neurological disorders, no history of intravenous or polydrug abuse, negative screen for AIDS and hepatitis B/C, post-mortem interval within 48 hours, and diagnosis of AD meeting the DSM-IV criteria 1 .

Sample preparation
The Qiagen RNeasy and Lipid Tissue kit (Qiagen, Valencia, CA, USA) was used to extract total RNA from human PFC brain tissue, and RNA concentration was measured with a NanoDrop 8000 spectrophotometer (ThermoFisher Scientific

Selection of covariates to for analyses
Linear regression: We first performed a linear regression with alcohol dependence as a dependent variable to identify possible covariates (e.g. sex, age, PMI). The mean age of AD subjects was 55.65 years and was not significantly different from the age of control subjects (54.96) ( Table 1). There was no significant difference in distribution of RIN and brain pH between cases and controls ( Table 1). Postmortem interval (PMI) was significantly lower for the alcohol dependent subjects.

Variance partition analysis:
We used the variancePartition package 18 in R to calculate the proportion of variance in RNA expression explained by known covariates such as age, gender, RIN and PMI, using the variancePartition package in R. The variancePartition 18

Differential gene expression analysis
Gene-level analyses started with the featureCounts-derived sample-by-gene read count matrix.
The basic normalization and adjustment pipeline for the expression data matrix consisted of: (i) removal of low expression genes ( < 1 CPM in > 50% of the individuals); (ii) differential gene expression analysis based upon adjustment for the chosen covariates. We filtered out all genes with lower expression in a substantial fraction of the cohort, with 18,463 genes with at least 1 CPM in at least 50% of the individuals; note that only these genes were carried forward in all subsequent analyses. The following design was used for the final differential expression analysis using the DeSeq2 19 package as implimented in R: gene expression ~ DSM4 alcohol classification +sex + age + PMI + RIN + batch.

Pathway analyses of differential expression
Ingenuity® Pathway Analysis (IPA®) was used to perform pathway, canonical pathways, and causal network analysis. All genes that passed the threshold of significance at 25% FDR were included in the analysis.

Gene ontology analysis
Gene ontology analyses were performed using the clusterProfiler package 20 as implemented in R. All differentially expressed genes that passed the threshold of significance at 25% FDR were included in the analysis. Results for the enrichment analysis were extracted and plotted using the ggplot2 package in R.

Gene co-expression analysis
Scale-free co-expression networks were constructed using the weighted gene coexpression network analysis (WGCNA) package in R 15 . WGCNA provides a global perspective, emphasizing the correlation between genes to classify different molecular groupings, rather than focusing on individual genes. WGCNA defines modules using a dynamic tree-cutting algorithm based on hierarchical clustering of expression values (minimum module size=100, cutting height=0.99, deepSplit=TRUE). The networks were constructed at a soft power of 14 at which the scale free topology fit index reached 0.90 (Supplementary Figure 2B). We further merged modules that had similar co-expression patterns by calculating the eigengenes and merging those having a correlation > 75% (Supplementary Figure 2C). Correlation of module eigengenes with alcohol dependence, alcohol consumption, AUDIT scores and number of years of drinking (module-trait correlation analysis) was evaluated using Spearman's rank correlation analysis. We used the DSM4 criteria for alcohol dependence classification as provided by the New South Wales Tissue Resource Centre at the University of Sydney. For each individual in the RNA-Seq dataset a module eigen value was calculated for each module. This module eigen value was used to perform the correlation analysis of the traits (e.g. alcohol dependence, alcohol consumption and Audit scores) with each whole module. Digital deconvolution showed no significant differences in the percentage of neurons, astrocytes and microglia in the PFC of alcoholics and controls (Supplementary Figure 3) 21 ; therefore we did not perform any correction for cell-type heterogeneity. Assigned modules were functionally annotated against known molecular/functional categories and pathways using Ingenuity Pathway Analysis (IPA).

GWAS enrichment analysis
The summary statistics from a GWAS of alcohol dependence (PGC-AD) were provided by the Psychiatric Genetics Consortium Substance Use Dependence working group 22  The two modules (thistle and brown4) that showed significant enrichment (p < 0.05) in the Fisher exact test were subjected to 100,000 permutations to report the final P value of enrichment. We also performed the gene based analysis by Multi-marker Analysis of GenoMic Annotation (MAGMA) 26 on summary statistics of PGC-AD, UKBB-AC and TAG-CPD GWAS using Functional Mapping and Annotation of GWAS (FUMA-GWAS) 27 . The summary statistics of this gene based analysis were overlaid on the IPA networks to identify the genes in these networks that also have nominal to moderate evidence of genetic contributions.

Differential expression analysis
Analysis of PFC tissue derived from 65 alcoholics and 73 controls identified 827 differentially expressed genes at 25% FDR, 298 genes at 10% FDR and 129 genes at 5% FDR ( Figure 1A Figure 1B). Gene-ontology enrichment analysis using clusteProfiler mapped a large proportion of genes to biological processes involved in blood coagulation and fluid transport ( Figure 1C). The network analysis in IPA mapped the significant genes to networks involved in neurodegenerative disorders and organismal injury. Several genes that were part of this network were also nominally significant (p < 0.05) in the PGC-AD and UKBB-AC GWAS ( Figure 1D).

Identification of gene co-expression networks and modules
After correcting for the effects of batch, age, and RIN, the hierarchical clustering of expression data from nearly 18,000 genes generated 27 different modules (Supplementary Figure 1). Traitmodule correlation analyses identified five modules that were significantly correlated to at least one alcohol related trait (Figure 2). Of these five modules, the thistle2 module (containing 72 genes), was negatively correlated with alcohol dependence and other alcohol related traits. The brown4 module (containing of 795 genes) was positively correlated with AD, AUDIT, alcohol consumption and duration of alcohol use.

Thistle2 module
Pathway enrichment analysis of the thistle2 module showed a significant down-regulation of pathways related to calcium signaling ( Figure 3A). Gene-ontology enrichment analysis using the clusterpProfiler showed significant enrichment for biological processes involved in "response to nicotine" and "excitatory postsynaptic potential" ( Figure 3B). Several genes in the thistle2 module that were significantly down-regulated in the PFC of alcohol dependent subjects.

Brown4 module
Pathway analysis for differentially expressed genes in the brown4 module showed significant enrichment for Growth Arrest and DNA Damage (GADD45) signaling and for biological processes related to the inflammatory response ( Figure 4). Other genes that were also significantly upregulated in the PFC of alcoholics mapped to networks involved in infectious and respiratory diseases.

GWAS enrichment analysis
GWAS enrichment analysis of significant eQTLs (P < 5 X 10 -8 ) for all genes in the top 5 modules (ranked by P value in module-trait correlation analysis), showed evidence of enrichment for SNPs associated with AD (GWAS p < 0.05) in PGC-AD and alcohol consumption in UKBB-AC datasets. The brown4 module was also enriched for GWAS association in the TAG-CPD dataset. The thistle2 module did not show enrichment of GWAS association. Surprisingly, genes in the thistle2 modules were significantly depleted for GWAS signals in the PGC-AD and UKBB-AC GWAS analyses. This finding was confirmed by permutation analysis.

Discussion
To our knowledge, this is the largest transcriptome analysis comparing PFC of alcoholdependent cases and controls. The present study identified 129 genes (FDR < 0.05) that were differentially expressed in alcohol dependent subjects (Supplementary table 1). FKBP5, a well studied gene that is asoociated with alcohol use [28][29][30][31] , showed increased expression in the PFC of alcohol dependent subjects in our differential gene expression analysis (l 2 FC 0.27; P = 4.57 x 10 -7 ). Other studies have also shown that FKBP5 plays a role in alcohol drinking behaviors in rodents 28,29 and humans 32 . FKBP5 encodes FK506-binding protein 5, a glucocorticoid receptor (GR)-binding protein implicated in various psychiatric disorders and alcohol withdrawal severity 30 . Qiu and colleagues 30 reported that Fkbp5 KO mice exhibited increased alcohol consumption compared with wild-type mice. Another study has shown that the absence of Fkbp5 enhances sensitivity to alcohol withdrawal in mice 33 . Recent findings also suggested that Fkbp5 expression in mesocorticolimbic dopaminergic regions is associated with early life-stress mediated sensitivity to alcohol drinking and that there is a gene environment interaction among FKBP5 genotype and parent-child relationship that influences alcohol drinking.
Genes showing significant differences in expression between alcohol dependent subjects and controls were enriched in pathways related to interferon and GADD45 signaling (Figure 1 B).
Interferons are cytokines that have antiviral, antiproliferative, and immunomodulatory effects and the interferon pathway plays a critical role in human innate and adaptive immune responses 34 . Our pathway analysis results are consistent with earlier findings showing induction of innate immune genes by stress and drug abuse 35,35 . Furthermore, mRNA expression studies in human brain showed significant changes in expression of genes related to immune or inflammatory responses in hippocampus 7 and nucleus accumbens 8 . The neuroinflammation associated with chronic alcohol exposure and withdrawal may be attributed to microglial activation and is associated with the neuropathology of chronic alcohol exposure 36 . Differentially expressed genes (FDR < 25% ) also mapped to networks associated with neurodegenerative disorders and organismal injury (Figure 1 D). Many differentially expressed genes in this network are involved in nervous system development and function. Specifically TRPC3 and calcium dependent protein kinase 4 (CAMK4) are involved in excitatory post-synaptic current while Ampa receptor, Glutamate Ionotropic receptor AMPA type subunit 4 (GRIA4), Calcium dependent protein kinase ii (CaMKII) and CAMK4 are involved in synaptic transmission.
Although we identified several genes that were differentially expressed in the PFC of alcoholdependent subjects, the variance explained by individual genes was very small (0.15-1%). The differential expression observed here is smaller than that reported in earlier differential expression studies of alcoholism, but it is consistent with differential expression studies of larger sample size 37 . For example, the CommonMind consortium reported similar fold changes in the differential expression study of schizophrenia and they showed that their observation is consistent with plausible models for average differential gene expression and the polygenic inheritance of schizophrenia. The polygenicity of AD has also been observed by the GWAS of alcoholism and other complex behavioral/psychiatric disorders 22,[38][39][40][41] , and it was demonstrated that effect size for each individual genetic variant is very small. Studies that used a co-expression network approach also showed that alcohol dependence is shaped, in part, by persistent alterations in networks of co-expressed genes that collectively mediate excessive drinking and other alcohol-dependent phenotypes 8,9 . These and other studies also demonstrated that the gene network structure is significantly correlated with lifetime alcohol consumption in addition to an overall loss in network structure; furthermore, the neurobiology of alcohol dependence may be due to altered covariation of gene modules, rather than discrete changes in differentially expressed genes across the transcriptome 9,13 .
Trait-module correlation analysis for the thistle2 module showed a significant negative correlation with alcohol dependence (-0. 28 Pathway analysis showed that all genes that overlapped with genes involved in calcium signaling were significantly downregulated ( Figure 3A). Acute ethanol exposure has been shown to inhibit Ca 2+ currents induced by PKC-dependent phosphorylation of mGluR5 in neurons 42 . Early studies in PC12 cell cultures also showed that ethanol has a significant inhibitory effect on the influx of Ca 2+ through L-type voltage-gated Ca 2+ channels 43 . Alcohol exposure also modulates Ca 2+ signaling between astrocytes and neurons 44 (Warden et al, 2016), and Ca 2+ acts as a second messenger that controls multiple processes in immune cells, including chemotaxis and secretion of pro-and anti-inflammatory cytokines. Our analyses provide further evidence that alcohol exposure alters Ca 2+ signaling in the brains of alcoholics and could potentially alter communication between neurons and brain immune cells. Another module that correlated with alcohol dependence, brown4, was also enriched in immune response and infectious diseases, providing additional evidence for the role of the neuroimmune system in the etiology of alcohol dependence. Some of the differentially expressed genes in this network were also statistical significant in the gene-based tests (RASD1, UKBB-AC, P = 1.64 x 10 -5 and ARID5A, UKBB-AC, P 1.4 x 10 -3 ). The differentially expressed FKBP5 gene was also part of the brown module, but it was not identified as hub gene according to intra-modular connectivity (supplementary   table 2). Enrichment analysis of nominally significant GWAS variants (p < 0.05) that were also eQTLs (p < 5 X 10 -8 ) for genes in the thistle2 module showed significant under enrichment in the two-tail Fisher test. The under-enrichment remained significant even after 100,000 permutations. This might be due to the small size of this module (N = 72 genes). Although some of the differntialy expressed genes were significant in the gene-based tests performed in UKBB-AC and PGC-AD datasets using MAGMA (CHRNA6, CHRNA2, MN1 and HAPLN1). In the calcium signaling network (Figure 3 C), a few genes that were not part of the thistle2 module, but were essential to create network connections, were also found to be significant (3.4 x 10 -2 < P > 4.8 x 10 -2 ) in the gene-based tests (circled in red; Fig 4C). This suggests possible gene-environment (alcohol exposure) interactions in the etiology of alcohol dependence. This also reinforces the need for multi-omics data to understand a complex disorder like alcoholism. eQTLs for genes in the In the present study, we focused on integrating the genomic information to transcriptomic data to identify gene (genetic background) x environment (alcohol exposure) interactions in the etiology of alcohol use disorders. As mentioned in the discussion we identified that genes that have altered expression due to alcohol exposure interact with risk genes (GWAS) to increase an individual's risk of becoming dependent on alcohol. So, to translate these findings in animals, one has to mimic expression of hub genes as well as the risk gene to alter the pathways associated with alcoholism. We are also reporting the direction of effect of the differential expression. That should provide information that can be used to see whether knock-down or overexpression of key genes alters risk for AUD phenotypes in models. Also, the replication of the modules in rodent models indicates which models might be useful to study the effects of dysregulation in these models.
Multiple lines of evidence derived from this study allowed us to prioritize the genes altered by exposure to alcohol. The gene co-expression network analysis also identified networks of genes altered in alcohol-dependent subjects. Further support for our findings comes from work showing that many genes in these networks were also associated with alcohol dependence and alcohol consumption in large GWAS study cohorts. This systematic exploration of transcriptomic organization in the PFC from alcoholics provides further support for the role of the neuroimmune system in alcohol dependence. The biological pathways and networks of genes identified in the current study will help prioritize genes for functional studies and may help advance targeted treatment approaches for alcohol use disorders.
We continue to be inspired by our memories of Henri Begleiter and Theodore Reich, founding PI and Co-PI of COGA, and also owe a debt of gratitude to other past organizers of COGA, including Data sharing policy: RNA-Seq data for alcohol dependence cases and controls will be available through GEO. All the summary statistics for differential expression will also be posted at INIA and COGA's homepage (and Shiny web app) and will be freely available to download after publication is online.