Deep transcriptome sequencing of subgenual anterior cingulate cortex reveals disorder-specific expression changes in major psychiatric disorders

How do differences in onset, symptoms, and treatment response arise between various mental illnesses despite substantial overlap of genetic risk factors? To address this question, we carried out deep RNA sequencing of human postmortem subgenual anterior cingulate cortex, a key component of limbic circuits linked to mental illness. Samples were obtained from 200 individuals diagnosed with bipolar disorder, schizophrenia, or major depression, and controls. Differential expression analysis in cases versus controls detected modest differences that were similar across disorders, although transcript-level differences were more pronounced. Case-case comparisons revealed greater expression differences between disorders, including many genes and transcripts that were expressed in opposite directions in each diagnostic group, compared to controls. Relative transcript abundances were associated with common genetic variants that accounted for disproportionate fractions of diagnosis-specific heritability. Inherited genetic risk factors shape the brain transcriptome and contribute to diagnostic differences between broad classes of mental illness.


INTRODUCTION
Genome-wide association studies (GWAS) have revealed that major psychiatric disorders share many common alleles with small additive effects on risk 1,2 . Similarly, gene expression studies in post-mortem brain tissue have found that expression changes in mental illnesses tend to be small and correlated across diagnoses. One large post-mortem study 3 found small changes in gene expression in schizophrenia (SCZ) versus controls that seemed to reflect the small frequency differences of common alleles associated with SCZ by GWAS 4 . A meta-analysis of microarray expression data drawn from several post-mortem studies found that many differentially-expressed (DE) genes were shared across major psychiatric disorders such as SCZ, autism, and bipolar disorder (BD) 5 . A large follow-up study that used RNA sequencing confirmed substantial overlap in DE genes in SCZ, BD, and autism spectrum disorder, versus controls, although transcript (isoform) level expression was more variable 6 .
If genetic risk factors and gene expression changes in brain are as similar across different mental illnesses as these studies seem to suggest, then how do diagnostic differences in onset, symptoms, and treatment response arise at all? One possibility is that diagnostically distinct genetic risk factors account for some of these differences. For example, genetic correlations as high as the 70% reported between SCZ and BD 1 are consistent with ~50% of risk alleles differing between disorders. It is also possible that published gene expression studies have overestimated the degree to which gene expression changes in post-mortem brain are similar across disorders. Even the largest sample sizes have been underpowered to detect subtle gene expression differences 3 , and those differences that have been detected may not be representative. Case-control study designs limit opportunities to detect true differences between case groups, and few published post-mortem gene expression studies have focused on direct, case-case comparisons between disorders. Moreover, few if any human postmortem studies have used methods that reveal most of the transcriptional complexity of the brain, where numerous gene isoforms are generated by alternative splicing and other mechanisms. How much of the apparent resemblance in gene expression reported among clinically distinct mental illnesses can be explained by these methodological limitations?
To address this question, we performed deep sequencing of the brain transcriptome followed by casecontrol and case-case comparisons of both gene and isoform expression in tissue obtained post-mortem from 200 individuals diagnosed with BD, SCZ, major depressive disorder (MDD), or no known psychiatric illness. Tissue was dissected from subgenual anterior cingulate cortex (sgACC), a key component of limbic circuits that have been implicated in major mood disorders and other mental illnesses 7,8 . Total RNA was sequenced at an average depth of ~270 million reads per sample, generating high-quality expression data for ~21,000 genes and ~85,000 transcripts in 185 samples. We believe these data represent the most complete sampling of the human brain transcriptome published to date.
The results demonstrate that subtle differences in gene expression observed between broad diagnostic classes of mental illness actually reflect more pronounced and diagnosis-specific changes at the transcript level, and that these changes are shaped by inherited genetic risk factors.

Transcriptome Profile of the Subgenual anterior cingulate cortex (sgACC)
Total RNA from 200 postmortem brain samples (Controls=61, SCZ=46, BD=39, MDD=54) was captured using the Ribo-zero protocol (see Methods) followed by stranded, paired-end sequencing. The samples, along with demographic information and known covariates are presented in Supplementary Table 1 In an effort to detect a more complete set of genes and transcripts than previous studies, RNA sequencing was performed at very high depth, generating ~50 billion reads at an average of ~270M reads/sample. Of these, 167M reads/sample mapped to ~21,000 Ensembl genes and ~85,000 Ensembl transcripts; identifying ~50% more transcripts than those used in previous studies 6,9,10 . The mapping summaries are presented in Supplementary Table   2. Very highly expressed transcripts with >10,000 reads/transcript comprised 1% of the total reads, while transcripts with <100 reads comprised 56% (Supplementary Figure 2). We detected an average of 5 isoforms/gene. The greatest number of isoforms (49 each) were detected for the genes MUC20-OT1 (a noncoding RNA) and TCF4, which has been associated with SCZ, autism, and intellectual disability 11-13 . in MDD vs SCZ and 5 genes that were DE in MDD vs. BD. The gene IDUA was expressed in opposite directions in each of these disorders, relative to controls. Supplementary tables 7, 8 and 9 show all genes DE at p<0.05 in each of the case-case comparisons.
Expression Quantitative Trait (eQTL) analyses. Genome wide association studies (GWAS) have identified numerous common variants associated with SCZ, BD, or MDD. In order to assess the effects of these variants on gene expression, we carried out an eQTL analysis with Matrix eQTL (see Methods). All samples were genotyped, imputed and filtered for common markers. This produced ~6M single nucleotide polymorphisms (SNPs) that were tested against expression of ~21K genes, resulting in ~92M eQTLs in cis (within 1MB upstream or downstream of a gene). Of these, ~290K eQTLs were associated with expression of 6,272 genes at FDR<5% (designated "eGenes," Supplementary Table 10). We performed a similar analysis only in the Caucasian samples, resulting in ~99K cis-eQTLs for 4K genes at FDR<5%. eGenes from sgACC were compared to the ~9K eGenes reported in DLPFC by the Common Mind  Table 11. A total of 2,741 eGenes were detected only in sgACC, 881 of which were not found in the largest postmortem brain study to date 6 . Of these putative sgACC-specific eGenes, 521 had low expression in sgACC, with base mean < 100 counts (Supplementary Table 11), suggesting that they may have been missed by less deep RNA sequencing.

Summary-data-based Mendelian Randomization (SMR) analyses.
In order to identify SNPs that were pleiotropically associated with both clinical diagnosis and gene expression in brain, we integrated published GWAS with brain eQTLs using SMR 17 . To increase power, we combined our eQTL results (Caucasian-only) with those detected in GTEx-ACC (~100 samples) and those found in CMC-DLPFC (~500 samples), thus boosting the total sample size to ~800.
As reported previously 18 , sample size was strongly predictive of the number of significant SMR "hits." In SCZ, we detected 20 pleiotropic variants in sgACC at FDR<5%. This number increased to 36 in the combined sgACC and GTEx-ACC samples, and increased further to 133 when the CMC-DLPFC samples were added (Supplementary Table 12). Some of the genes reported here for the first time in association with a psychiatric GWAS locus include: CORO7, an F-actin regulator thought to play an essential role in maintenance of Golgi apparatus morphology; TSHR, which codes for the thyroid stimulating hormone receptor, previously reported to harbor rare, loss-of-function mutations in ADHD 19 ; and ORMDL3, which plays a role in sphingolipid synthesis and regulation of Ca2+ levels in the endoplasmic reticulum 20 .
Integration of our eGene data with the latest BD GWAS results detected 4 pleiotropic variants, increasing to 16 when all brain samples were considered (Supplementary Table 13). This SMR analysis replicated 4 genes reported in previous SMR studies of BD 6,21,22 : DDHD2, MCHR1, FADS1, and PACS1. The present study detected 5 novel BD eGenes, including: ORMDL3, noted above in relation to SCZ; XPNPEP3, an ezyme involved in glomerular filtration, protein processing, and alternative splicing; PDF, a mitochondrial protein involved in cell proliferation; and 2 non-coding RNA genes.
Similar analyses in MDD yielded 4 significant hits, but only in the combined dataset. (Supplementary   Table 14). Exemplary genes include DLST, implicated in a recent anxiety GWAS 23 ; and B3GALTL, mutations in which causes Peters-plus syndrome, characterized by developmental delay 24 .

Transcript-level Analyses
Case-Control comparisons. RNAseq was carried out at high read depth in order to capture the highest number of detectable transcripts from the sgACC. Transcript-level comparisons between cases and controls showed 2-3 times more significant differences than the gene-level comparisons, despite a greater multiple-testing burden. At FDR<5% the number of DE transcripts was: 470 in SCZ, 380 in BD, and 286 in MDD, a gradient that is consistent with typical clinical severity across disorders (Figure 1c). Most (>90%) of the DE transcripts were predicted to be protein-coding (Supplementary Table 15). Many of the DE transcripts showed read counts <100, and were detected as DE in mental illness for the first time in this study (Supplementary Figure 6). All DE transcripts in SCZ, BD and MDD with p<0.05 are presented in Supplementary Tables 16-18. In contrast to the gene level analyses, transcript-level analyses detected substantial diversity in DE transcripts across disorders. The mean overlap of DE transcripts across disorders was 18% (Figure 2b), and the correlation of FC values across diagnoses (linear regression R 2 ) was 0.43-0.44% (Figure 3b). For some genes, the same transcript was DE in all 3 disorders (e.g., NR4A1-213, Figure 4), but there were several instances of alternative isoform usage, where distinct transcripts of the same gene were DE in different disorders (e.g.,   [22][23][24]. Since the absolute difference between case groups is larger for opposing transcripts (Figure 5d-5e), when power is limited such transcripts should be more easily detected in case-case than in case-control comparisons.
Comparisons of major mood disorders (BD and MDD) compared to SCZ found additional transcripts that were regulated in the same direction in the mood disorders and in the opposite direction in SCZ (Supplementary  Table 26). These eQTLs significantly overlapped with transcripts that were DE in SCZ (hypergeometric p-value<10 -5 ) or in BD (hypergeometric p-value<0.01), but not in MDD, suggesting that common alleles contribute to disorder-specific differential expression at the transcript level.

SMR analyses.
To quantify the degree to which risk alleles contribute to transcript-level differences between disorders, SMR analysis was carried out with the transcript-eQTLs and summary results from recent GWAS. SNPs pleiotropically associated with both transcript expression and disease (FRD<5%) included 18 associated with SCZ and 4 associated with BD. Some of the significant SCZ loci include transcripts of FAM134, KATNAL2, and ZNF738. Signifcant BD loci include transcripts of FADS1, XPNPEP3, MAP1LC3A, and ARPC3 (Supplementary Table 27).

Validation of DE genes and transcripts
Validation of the RNAseq analyses was carried out in the same 185 samples using real time quantitative PCR (RT-qPCR) using 6 genes and transcripts with read counts over 500 (see Methods). All tested genes and transcripts showed the same direction of effect with respect to diagnosis as was detected in the RNAseq analyses.
This result was significantly different from chance (binomial p-value of 0.016).

Functional enrichment of DE genes and transcripts
Functional enrichment of analysis was carried out with DAVID. 27 No significant functional enrichment was detected for any gene-level results, but DE transcripts did reveal some significant enrichment, consistent with the improved resolution reported above for transcript-level comparisons. Genes harboring transcripts that were DE at FDR<5% in any of the case-control comparisons were significantly enriched for functional annotations related to synapse and antigen processing (Supplementary

Effects of Common Variants on Transcript Abundance
In addition to changes in gene or transcript-level expression, SNPs can modify the transcriptome by driving shifts in the relative abundance of transcripts within a gene 28 . Such SNPs are known as sQTLs, due to their putative effect on alternative splicing 29 . Accordingly, we assessed the relationship between common variants and relative abundances of all detected transcripts in each of the ~21K genes included this study. This analysis detected ~90K sQTLs associated with transcript abundance of ~5K genes at p-value<0.05 (Supplementary Table 31).
A similar analysis restricted to the Caucasian samples detected ~75K sQTLs in ~4K genes at p-value<0.05 (Supplementary Table 32). As expected, most (63%) transcripts associated with one or more sQTLs were predicted to result from classical alternative splicing events, including exon skipping, intron retention, alternative donor, or alternative acceptor sites (Supplementary Figure 7). These genes overlapped significantly (hypergeometric p < 1.11e-16) with those identified by different methods in the CMC data 30 . About 20% of these sQTLs are intronic variants located within 1 KB of a known splice site (Supplementary Table 33).
Genes harboring one or more sQTLs significantly overlapped wiht genes harboring DE transcripts from the case-control (n=368; OR=2.2, hypergeom p<10 -16 ) and the case-case (n=335; OR=2.5, hypergeom p<10 -16 ) comparisons. No significant functional enrichment was detected for the overlapping genes from the case-control analyses, but case-case overlaps were significantly (FDR<5%) enriched for synapse, post-synaptic density, cell junction, and spectrins. Box plots of genes harboring sQTLs and DE transcripts are shown in Supplementary sQTLs within GWAS loci. As observed with eQTLs, some sQTLs may be pleiotropically associated with disease risk. To investigate this, we searched for SNPs that were identified as risk alleles by GWAS and as sQTLs in the present study. We used only those sQTLs that were detected in sgACC from self-reported Caucasian brain donors, since existing GWAS are mainly based on samples of European ancestry,.
Overall, about 10-25% of GWAS loci we investigated harbored one or more significant sQTLs. In SCZ, of 430 genome-wide significant GWAS SNPs 25 , 56 were identified as an sQTL in the present study, implicating 44 genes. Of these, 15 different genes are linked here for the first time to functional variation within a SCZ GWAS locus (Supplementary Table 35). In BD, of 329 genome-wide significant GWAS SNPs 21 , 47 were identified as sQTLs (Supplementary Table 36. Some of the novel implicated genes include ATP11B, NCAM1, SLC12A9, and SLC4A10. In MDD, 10 out of 44 GWAS loci 26 were found to harbor sQTLs, implicating the genes BAG6, KCNQ5 and ERBB4, among others (Supplementary Table 37).

Relative contributions of eQTLs and sQTLs to disease heritability. Previous studies have
demonstrated a substantial contribution of eQTLs to heritability for a variety of common disoders 33 , but few studies have investigated the specific contribution of sQTLs. Thus we estimated the proportion of heritability that could be explained by sQTLs, relative to eQTLs, in some major mental illnesess.
Both eQTLs and sQTLs accounted for disproportionate fractions of heritability in SCZ, BD, and MDD (Table 3). However, sQTLs, which comprised only 1% of all SNPs, explained 16-21% of the heritability, a 14-18 fold enrichment (p<0.01). sQTLs showed even greater enrichment of heritability for some other major psychiatric disorders, but not all. sQTLs explained 29% of the heritability of autism spectrum disorders (25-fold enrichment, p<0.001) and 41% of the heritability of attention deficit hyperactivity disorder (36.6-fold enrichment, p<0.001), but showed no significant enrichment for Alzheimer Disease or anxiety disorders.

DISCUSSION
We have employed deep transcriptome sequencing in post-mortem brain tissue to address a central question in the pathophysiology of mental illness: How do diagnostic differences in onset, symptoms, and treatment response arise despite substantial overlaps of genetic risk factors across disorders? The findings show that subtle differences in gene expression observed between broad diagnostic classes of mental illness actually reflect more pronounced and diagnosis-specific changes at the transcript level. These changes often involve alternative isoform usage, are most evident in case-case comparisons, and are influenced by common genetic variants that also play a disproportionately large role in the inherited risk of mental illness.
These findings should be considered in light of several limitations. While this was one of the larger RNA sequencing studies performed to date in human brain tissue, sample size was limited, with less than 100 individuals in each diagnostic category. This limits power to detect small changes in expression. This limitation was offset by complementing the typical case-control comparisons with case-case comparisons between distinct psychiatric diagnoses. This approach should be more powerful in the face of opposing expression changes between case groups, which we detected for hundreds of genes and transcripts. As in all human post-mortem studies, diverse factors effect post-mortem expression, not all of which can be measured or controlled, increasing noise and introducing the risk for bias. We attempted to address this by stringent quality control, careful adjustment for key covariates, and confirmation of top findings by use of alternative approaches. Also inherent in human post-mortem samples is the inability to draw confident distinctions between expression changes that cause disease and those that result from disease or its treatment. The kind of integrative genomic approaches we employed, which draw causal inferences on the basis of published GWAS 17 , partially address this issue but still lack power to distinguish between correlated causal and non-causal events. This study focused on a single brain region, albeit one that has been little studied, so cannot shed light on transcriptional differences between brain regions that may be important in disease. While we employed very deep RNA-sequencing to detect rare transcripts and better identify alternative splicing, the data still contain relatively few sequencing reads that uniquely map to splice junctions, limiting the confidence with which transcript-level differences can be ascribed to differences in alternative splicing rates and mechanisms. Finally, the sequencing was performed on bulk tissue. This means that the results lack cellular resolution, but also that they comprise a much larger portion of the brain transcriptome than can usually be achieved with current single cell technologies.
Several large postmortem brain gene expression studies of mental illness have been published recently, some with larger samples than were employed here 6,9,34 . The present study complements those larger studies in several ways. None of those studies evaluated the subgenual anterior cingulate cortex (sgACC), a key component of limbic circuits that has been implicated in reward, impulse control, and emotion regulation 5 , as well as in major mood disorders. 7,8,[35][36][37] To date, genome-wide expression studies in ACC or sgACC have been carried out in fewer than 50 samples 38-40 . Some of the genes and transcripts whose expression we found to be associated with GWAS hits were detected only in ACC, consistent with the idea that common risk alleles may regulate expression in a tissue-specific manner 41 . None of the previous studies employed RNA-sequencing at a depth comparable to the 270 million reads per sample used here, which enabled us to capture at least 10 reads per sample in more than 21,000 genes and 85,000 transcripts. This was particularly beneficial in transcript-level analyses, which found numerous DE transcripts, shifts in relative transcript abundance that were associated with numerous common variants (sQTLs), and strong evidence that such variants play a disproportionately large role in the heritability of SCZ, BD, and MDD. The present study was also unique in its use of case-case comparisons, which revealed unexpectedly large numbers of genes and transcripts that were DE between case groups. Many DE genes and transcripts showed opposite directions of expression between cases, when compared to controls, suggesting one important way in which genetically correlated disorders can express contrasting transcriptomes that could reflect diagnostic differences in onset, symptoms, or treatment response.
Most of the transcripts whose expression was associated with common genetic variants in this study are predicted to arise from classical alternative splicing mechansims. These findings are consistent with previous results suggesting that alternative splicing plays an important role in differential transcriptome expression in major psychiatric and other disorders and that it is mediated, at least in part, by some of the same genetic variants identified by GWAS 32 . A secondary analysis of RNA-sequencing data from psychiatrically unaffected individuals drawn from the CMC dataset found evidence of alternative splicing associated with common genetic variants, some of which overlapped with known SCZ risk loci, but did not assess for differential expression in individuals diagnosed with SCZ 30 . A recent study of post-mortem human cortex at 85 million reads per sample detected shifting isoform usage across pre-and post-natal life and a number of transcripts that were DE in SCZ 41 .
Our results support those previous findings and demonstrate that common genetic variants associated with relative transcript abundance (sQTLs) account for disproportionate fractions of disorder-specific heritability.
Taken together, these findings suggest that inherited genetic risk factors shape the brain transcriptome and contribute to diagnostic differences between broad classes of mental illness. This may be one important mechanism by which diagnostically distinctive patterns of transcript expression arise despite the known correlations in inherited genetic risk factors across major psychiatric disorders. Despite the growing recognition of the importance and complexity of alternative splicing, its role in the development of neuropsychiatric diseases is still poorly understood. Much future work is needed to characterize the functions of alternative transcripts, the timing of alternative isoform usage during disease course, and the potential impact that therapeutic agents may have on the expression or function of particular isoforms in specific brain regions and cell types.

Tissue availability
Tissue samples are available to qualified scientists upon review by the NIMH Human Brain Collection Core oversight committee. Information is available at https://www.nimh.nih.gov/research/research-conducted-atnimh/research-areas/research-support-services/hbcc/distribution-of-resources.shtml          Known genes and transcripts belonging to autosomes and pseudo-autosomal regions (PAR) were included in the analysis. Gene and transcript counts were obtained using StringTie 3 software (http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual. Sample and Gene selection: Pre-noise filtered quantile (Log2(FPKM+2)) expression values for all 200 samples were used to calculate Pearson correlations between each possible pair of samples within a diagnostic category.
The average correlation for each sample within a diagnostic category was obtained, providing a single estimate of relatedness of that sample to all others in its diagnostic category. We then tested for deviance of each sample to the rest of their diagnostic group by a z-score test, choosing a conservative threshold of p<0.1 to define individual samples as outliers. By this approach, 15 samples were identified and removed as outliers (Supplementary Figure   10). Lowess modeling was performed by disease class to remove noise-biased expression, thus resulting in ~21K genes (Supplementary Figure 11).

Covariates:
We tested for a total of 23 known covariates: RIN, age, gender, reported race, pH, RNA extraction batch, library batch, postmortem interval (PMI), PMI confidence, GC percent, manner of death, smoking status, source location, height, and weight, along with available data on alcohol, anti-depressants, mood stabilizers, benzodiazepines, nicotine, THC, cocaine, and opiates. Univariate and step-wise AIC regression analysis was performed on all the covariates. Any covariates that were significant after the multiple testing were included in the downstream analysis.

Gene-and transcript-level analysis:
We normalized the raw counts using conditional quantile normalization 4 (CQN) (https://bioconductor.org/packages/release/bioc/html/cqn.html) which corrects for library size, gene length and GC content. These normalized counts were then imported to DESeq2 5 (https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and transformed to variance stabilized data (VSD). Principal component analysis (PCA) using VSD data resulted in 10 PCAs all of which together explain >97% of variance (each PCA >= 0.3%). Each of the 10 PCAs were linearly regressed against the 23 known covariates. Only 3 covariates -RIN, race and GC percent were significant after correcting ted for multiple tests, so these were included in the downstream analysis. A total of 185 samples (55 controls, 35 bipolar disorder, 44 schizophrenia, 51 major depression) were included in the analysis after removing 15 outliers. A total of 21,228 genes and 85,295 transcripts were included in the analysis. DESeq2 along with lfcShrink option was used for differential expression analysis between each diagnostic group versus the controls. The p-values were corrected using FDRtool 6 software in R.
Analysis of variance (ANOVA) analysis: DE genes (p<0.05) in each of the disorder were validated by ANOVA after correcting for the covariates. Dunnet's multiple comparisons tests were used as follow up tests to ANOVA.

DAVID Functional annotation:
Functional annotation was performed in DAVID which is a database of annotation, visualization and integrated discovery 7 (https://david.ncifcrf.gov/content.jsp?file=citation.htm). All 85,295 transcripts that met criteria for inclusion in the DE analysis were used as background genes.