Introduction

Major depressive disorder (MDD) is a complex phenotype with moderate heritability estimates (31–42%).1 Physiological correlates include alterations in neurohormonal secretion, structural brain anatomy, and markers of immune function and inflammation. In contrast to the other two most-studied psychiatric disorders in adults (bipolar disorder and schizophrenia), or traits with comparable heritability such as Type 2 diabetes (26%),2 it has been challenging to identify relevant genetic factors for MDD from genome-wide association studies (GWAS), with the largest GWAS to date (with over 9000 MDD cases) detecting no significant associations with common single-nucleotide polymorphisms (SNPs).3 This might be explained by MDD’s phenotypic heterogeneity4,5 and complex genetic architecture,6 and its association with environmental factors that are likely to interact with genetics.7

Whole-transcriptome studies offer another type of genome-wide search for disease-related mechanisms by measuring mRNA expression levels of each gene in a relevant tissue. Although expression data do not directly disentangle cause–effect relationships, altered expression levels in disease can reflect the effect of common and/or rare genetic sequence variation, environmental factors, the effects of the disease processes itself, and interaction between genetic variation and environmental factors. We applied deep RNA sequencing (RNA-seq) to whole-blood RNA in a large sample from a population-based survey research panel, including cases on and off psychiatric medication (unlike most clinical samples). This ascertainment method, which made whole blood the only feasible tissue for expression profiling, had the advantage of providing information on subjects’ “real-life” state (in contrast with cell lines or postmortem tissues), with the limitations of requiring statistical correction of many state-related, possibly confounding variables, and of having limited potential to identify some brain-specific mechanisms. On the other hand, there is increasing evidence implicating dysregulation of glucocorticoid and immune responses in MDD, and white blood cells may be a particularly a suitable tissue for studying the relevant immunological factors.

We present here the largest whole-transcriptome study of MDD to date and the first using RNA-seq. After collecting psychiatric, demographic, environmental and medical information, we studied 922 European ancestry individuals (463 cases and 459 controls) with RNA-seq of whole-blood RNA and with a GWAS assay. Three types of analysis of association to MDD were then carried out: (i) analyses of individual gene expression levels produced a significant excess of low P-values, but no significant association of a single gene after correction for genome-wide testing; (ii) analyses of expression levels across pre-defined pathways and gene sets detected significant genome-wide association with the interferon (IFN) α/β signaling pathway; and (iii) joint evaluation of gene expression and SNP genotypes identified significant association to CINP (cyclin-dependent kinase 2 interacting protein), a gene involved in the cell cycle arrest, possibly induced by IFN.8,9

Materials and methods

Subject recruitment and assessment

(See Supplementary Methods and Batte et al.,10 (in revision, Genome Research) for additional methodological details.) Recruitment was carried out under Institutional Review Board-approved protocols. Cases (recurrent or chronic MDD) and controls were recruited by Knowledge Networks (Menlo Park, CA, USA) from a nationally representative panel of ~60 000 individuals. Knowledge Networks invited 14 463 individuals (self-reported Caucasian ancestry, ages 21–60) for screening. Respondents were screened online with lifetime self-report versions of the depression and alcohol/substance dependence modules of the Composite International Diagnostic Interview-Short Form (CIDI-SF)11 plus screening items for psychotic and bipolar disorders. After excluding those who reported current dependence or lifetime psychotic or bipolar disorder, individuals who tentatively met inclusion criteria (see below) were invited to participate. Those who consented to be contacted answered additional online questions about height, weight, childhood trauma and smoking. Among those who completed blood draw by a national phlebotomy company (after giving written informed consent), 650 prospective cases and 589 prospective controls were interviewed by telephone with a modified Structured Clinical Interview for DSM-IV (full depression, bipolar, alcohol, substance and anxiety modules; psychosis screen; and a more detailed illness and medication history); Patient Health Questionnaire (PHQ-9) (current depression)12 and Generalized Anxiety Disorder 7-item scale (GAD-7) (current anxiety)13 scales; and a family history screen (MDD, bipolar disorder and suicide). Most telephone interviews were completed within 1–2 months of online screening and blood draw, with delays of several months in a small proportion of cases. (Sample sizes at each stage are provided in Supplementary Methods.)

Clinical data were reviewed by the clinical site principal investigator (PI) (MMW or JBP) and the overall PI (DFL). After exclusions for substance dependence, non-affective psychosis and bipolar disorder, we identified 475 cases with MDD with two or more lifetime episodes or one episode lasting 2 years, and 474 controls who never experienced a 2-week period with depressed mood plus two or more other MDD criteria. After exclusions for genotypically non-European ancestry, unusual medical comorbidities, and RNA-seq and GWAS quality control analyses, 922 individuals were included in further analyses (463 cases and 459 controls).

RNA-seq and genotype data

Whole blood was collected in PaxGene tubes (Qiagen, Valencia, CA, USA) for RNA and in acid-citrate-dextrose tubes for DNA. PaxGene tubes contain a proprietary cell lytic agent for immediate release of RNA from cells; mRNA levels remain stable for years with appropriate storage.14 Extracted DNA was genotyped with the Illumina Omni1-Quad microarray (Centrillion Biosciences, Mountain View, CA, USA). Post-quality control genotypes were available for 720 591 autosomal SNPs. RNA was extracted; most globin RNA was removed (GLOBINclear Kits, Life Technologies, Grand Island, NY, USA) (Supplementary Figures S1–S4); and Illumina TrueSeq kits (Illumina, Inc., La Jolla, CA, USA) were used for RNA purification (polyA selection), chemical fragmentation, single-stranded cDNA conversion, DNA library preparation and oligonucleotide barcoding. Sequencing was carried out with an Illumina HiSeq 2000 (Illumina, Inc.; 50- or 51-bp single-ended reads, 3 multiplexed libraries per lane) yielding 70 million reads (average) per individual. Reads were mapped to the NCBI v37 Homo sapiens reference genome using Tophat15 Complete genotype and RNA-seq data are available form https://nimhgenetics.org (see data access note below).

Gene expression was quantified with HTSeq using the ‘intersection_strict’ criteria.16 Only uniquely aligned reads were used to quantify gene expression levels. Reads were assigned to 21 578 of the 22 339 annotated protein-coding genes (NCBI v37). Analyses included 13 857 autosomal genes with 100 individuals with 10 reads (in total from anywhere across the transcript). Effects of technical covariates (for example, per individual 5′ bias, GC bias, sequencing depth, and percent globin reads) and biological covariates (for example, estimates of blood cell-type proportions and time of day of blood draw; see Supplementary Table S1 for complete list) were removed by ridge regression of logarithm-transformed read counts. Cell-type proportions were inferred using a method based on non-negative least squares,17 making use of external microarray data on cell-type specific expression signatures18 (Supplementary Methods).

Association of gene expression levels with MDD

A likelihood ratio test (LRT) was used to determine the significance of association between the expression levels of each gene and MDD. Using LRT, the strength of the association is determined by comparing the likelihood of the null (‘background’) model that includes only a set of confounding factors (see below) with the likelihood of the full model (confounding factors plus expression value; Supplementary Methods). Final P-values were obtained using permutation analyses (8000 initial permutations or 1 000 000 permutations for three genes with P=0 in the initial 8000) (Supplementary Figure S5). False discovery rate (FDR) was used for multiple hypothesis correction.19

In total, we corrected for 39 ‘background’ covariates (Table 1,Supplementary Table S2). These consist of 24 environmental, physiological and medical covariates, including body mass index, gender, smoking and age; manually curated indicators of intake of medications and substances (reported by 30 individuals), including cholesterol-lowering and antihypertensive medication classes, thyroid medication, cannabis and alcohol use, many of which were more common in cases and had a non-negligible effect on gene expression (Supplementary Table S2 and Supplementary Figure S10). Other covariates include five genotypic principal components (PCs) reflecting population structure20 (Supplementary Methods and Supplementary Figure S4); and 10 hidden confounding factors obtained as PCs of the residual expression data (Supplementary Figure S6). We decided to include expression PCs because previous studies have shown that removal of expression PCs results in improved power for detecting expression quantitative trait locus (eQTLs)21 and identifying true-positive co-expression relationships between genes.22 We also evaluated whether there was an excess of low P-values (deviation from the expected uniform distribution), by estimating the proportion of true positives using the π1 statistic.23

Table 1 Covariate values for cases and controls

Association of expression of gene pathways

Association of gene pathways was analyzed using two methods, based on all 1325 canonical pathways from MSigDB24 (c2.cp.v3.1) that contained 5–100 expressed genes in our data. Gene set enrichment analysis (GSEA)25 assessed the significance of each pathway using 5000 permutations of MDD status for each pathway, where the log(LRT P-values) was used as the score for each gene. Hypergeometric tests were used to evaluate over-representation of pathways among subsets of the top N genes (ranked by association P-value) compared with all expressed genes. To ensure the robustness of this result, we repeated the analysis with varying N (N={30, 60, 100, 150, 300, 500}). FDR was used for multiple hypothesis correction,19 accounting for the number of assessed pathways.

Association of MDD with genetic variation and relationship with gene expression

Association between MDD and each genotyped SNP was evaluated using a standard logistic regression test, adjusted for five genotypic PCs (Supplementary Figure S4). We also looked up genotypic association P-values in the independent Psychiatric Genomics Consortium (PGC) MDD GWAS (9240 cases and 9519 controls).3 In a joint analysis of gene expression and genotypes, we identified genes whose genetic effect on MDD may be mediated through altering gene expression. Here, we used as a test statistic the least significant of two P-values for association of MDD with (i) expression of that gene and (ii) genotype of its strongest eQTL SNP (Supplementary Methods). By evaluating the maximum (the least significant) of these two P-values as a single test statistic, this analysis identifies relationships in which both expression and genotype support the association between the gene and MDD. Significance was evaluated by computing this statistic for 1 000 000 permutations of MDD status for each gene. To derive stable estimates of P-values, a Weibull extreme value distribution was fit to the permutation test statistics and used as the null distribution to estimate the probability of observing the statistic seen in the real data.

Clinical variables

Factor scores (PCs analysis) were computed separately from clinical variables and childhood trauma questionnaire responses, and association of these scores with IFN α/β signaling pathway (PC1) scores (Figure 1 legend) was analyzed by analysis of variance, corrected for age and sex (Supplementary Methods).

Figure 1
figure 1

Interferon (IFN) α/β signaling pathway principal component (PC1) scores. Shown are the distributions of values of a score that summarizes expression levels of genes in the IFN α/β signaling pathway for cases (magenta) and controls (cyan). Each bar indicates the number of individuals with a score between the x axis value and the next higher value. Scores were computed as PC1 from PCs analysis of normalized read counts for the 20 genes (shown in Table 3) in the pathway with P<0.05 for association with major depressive disorder (MDD) individually (among the 49 genes, of 64 in the pathway, that passed the inclusion criterion of 10 total reads in 100 individuals). Gene set enrichment analysis (GSEA) identified a significant association (0.05 FDR) between IFN α/β pathway gene expression and MDD (note that GSEA uses expression data for all genes, and not the summary PC1 score which is shown in this figure). There is an excess of cases with higher scores, as shown by the numbers over the brackets. Raw read counts were initially corrected for technical and biological covariates (Supplementary Table S1: specimen-specific sequencing variables, RNA quality, white cell-type proportion estimates, time of blood draw). Analysis of case-control difference included additional covariates (see Materials and methods) including medication and substance classes seen in 30 subjects (Table 2 and Supplementary Table S2). Case scores were not predicted by clinical variables or childhood trauma scores (Supplementary Results). Case-control differences and enrichment of top gene subsets for this pathway were not explained by excluding 91 individuals with rarer medical diagnoses or medications, estimating white cell-type proportions by a second method, controlling for intake of three antidepressant classes, or controlling for substance abuse/dependence or steroid medications.

PowerPoint slide

Results

Association of single genes with MDD status

Analysis of single genes did not identify a genome-wide significant association (P<3.6E-6, correcting for 13 857 genes). An excess of small P-values was observed (Supplementary Figure S7). We quantified this excess by estimating the proportion of true-positive tests (π1=0.13),23 which indicated that expression levels of many genes are modestly associated with MDD. In addition, we performed a power analysis to estimate the theoretical effect size with LRTs given the current cohort size.26 We estimated that we have 80% power to detect genome-wide significant P-values (<3.6E-6) for individual genes with an expected estimated odds ratio of ≥1.6 (log fold change of ~0.5). This analysis suggests that, should there be a true association from blood expression profiles, the odds ratio of is likely <1.6. At a less stringent threshold (0.25 FDR), there were 29 associated genes (Table 2), with biological functions including innate immune processes, vesicle trafficking, cell cycle regulation and splicing. The top two genes were MINOS1 (organization of mitochondrial inner membrane, P<5E-6) and COPG (part of the Coatomer complex involved in vesicle trafficking between ER and Golgi, P<8E-6.

Table 2 Top MDD-associated genes at 0.25 FDR

Pathway enrichment analysis

In analyses of the enrichment of pathways among sets of top genes, all tested subsets of 60–500 genes were significantly enriched (0.05 FDR) for one pathway: IFN α/β signaling (REACTOME) (Figure 1, Supplementary Figure S8, Tables 3 and 4and Supplementary Table S7). GSEA also identified only this pathway at 0.05 FDR by permutation tests (uncorrected P=2.5E-5, Bonferroni-correct P<0.05). Of the 64 genes in this pathway annotated in MSigDB,24 49 genes were adequately expressed here, of which 20 had a nominal P<0.05 (Table 4). Overexpression was observed for 34 of the 49 genes including 19 of the 20 nominally significant genes (Table 4 and Supplementary Table S7). No clinical variable was observed to predict IFN pathway (PC1) scores in cases (see Figure 1 legend and Supplementary Data 1 for the list of top 10 pathways).

Table 3 Enrichment P-values for association of IFN α/β signaling pathway and MDD
Table 4 IFN α/β signaling pathway genes with the strongest associations with MDD

Post-hoc re-assessment of potential confounders

Possible sources of spurious association with the IFN α/β signaling pathway were explored in post hoc analyses:

Unusual medication factors

We excluded 91 additional subjects with medications or illnesses with any potential impact on the immune system that were too rare for individual adjustment (Supplementary Table S4, for example, insulin, histamine-2 or leukotriene antagonists, antibiotics, immune suppressants; and autoimmune diagnoses). Enrichment of IFN α/β signaling pathway among subsets of 60–500 genes remained significant (0.05 FDR; Supplementary Table S3).

Cell-type proportions

Initial normalization of the data removed the effect of inferred cell-type proportions (Supplementary Methods), and as expected, we did not observe any correlation between these estimates and the residual expression levels of IFN pathway genes. (In raw data, IFN signaling is most strongly correlated with the proportion of activated dendritic cells, but that proportion is not predicted by MDD status.) As an additional check, we used an alternative computational method (ridge regression instead of non-negative least squares) to estimate cell-type proportions. The enrichment of IFN α/β signaling pathway among the top 100, 300 and 500 genes remained significant (0.05 FDR), after accounting for re-estimated cell-type proportions in the LRT (Supplementary Results).

Antidepressants

In cases, we computed LRT P-values for association of all gene expression levels with three separate antidepressant classes taken by 30 individuals: serotonin reuptake inhibitors, serotonin-norepinephrine reuptake inhibitors, bupropion (Supplementary Table S3). Of the 20 nominally significant IFN signaling genes (Table 4), none was among the top 300 genes associated with any of these classes, and no enrichment for this pathway (P>0.5) was observed in sets of top 30–500 associated genes (whereas 17 of the top IFN signaling genes were among the top 300 in the primary analysis of association with MDD).

Drugs of abuse and steroids

Enrichment analyses were repeated (Supplementary Table S3) with additional covariates to account for substance dependence (derived from covariates, Supplementary Methods) and for Fagerstrom nicotine dependence score; and for classes of steroid medications (primarily inhaled steroids for asthma or allergies). IFN α/β signaling genes were enriched (0.05 FDR) in the subsets of top 100–500 genes, after accounting for these covariates in the LRT.

Association of genetic variation with MDD status

One common SNP showed genome-wide significant genotypic association with MDD (rs11232553, chr11:80,941,531; P<3E-8; 1.8 Mb downstream of MIR708 and 660 kb upstream of MIR4300), but no association for this SNP was observed in the much larger PGC MDD data set (Supplementary Results). The joint analysis that combines expression data with genotype (Materials and methods) identified one genome-wide significant gene, CINP—0.05 FDR; uncorrected P=2.5E-4 for expression level alone, 3.2e-04 for genotypic association of its top eQTL (rs2896439), 1e-6 for the combined test. A modest association of rs2896439 in the PGC cohort (P=0.035) suggests caution in interpreting our result. Three other genes achieved FDR< 0.1: SCAI (cell migration and regulation of cell cycle), SDK1 (axon guidance and implicated in HIV-associated nephropathy) and RABEPK (endosome to Golgi trafficking). We did not detect evidence of a relationship between genetic variation affecting IFN pathway genes and MDD in either analysis (joint analysis or genotype-only) of this cohort, or in a targeted analysis of the PGC data set (Supplementary Results).

Discussion

Type I IFN signaling and MDD

Significant association was observed between MDD and expression of IFN α/β signaling pathway genes in a large population-based sample, using whole-blood RNA reflecting the physiological state at the time of blood draw. Primary analyses removed the effects of a large set of technical, biological, medical and hidden covariates, and post hoc analyses did not identify measured factors that explained the finding.

IFN-α and IFN-β are type I IFNs (IFN-I), the main cytokines of the innate immune system that respond primarily to viral infection and to malignant cells. They activate genes that interfere with viral replication, activate other immune responses to infection and inhibit cell growth.27 A ‘weak’ (chronic) signaling mode, that is observed in the absence of known pathogens, may increase the efficiency of response to stimuli, but may also have a role in autoimmune and neuroinflammatory disorders.28 The binding of IFN-I to IFN receptors (IFNAR) can activate two main transcriptional complexes: IFN-α-activated factor, which mainly mediates activation of other cytokines, and IFN-stimulated gene factor 3 (ISGF3), which mainly mediates antiviral activity. Here, we observed the upregulation of ISGF3 induced genes in MDD cases (Table 4), including IRF9, a component of the ISGF3 complex.29

Increased IFN-I signaling in MDD is consistent with previous data implicating dysregulation of cytokines in depression. The most direct observation is that patients receiving IFN-I therapies (IFN-α for hepatitis C or IFN-β for multiple sclerosis) often develop clinically significant depression.30, 31, 32, 33 Previous studies have associated changes in secreted cytokines and inflammatory markers, including IFN-I,34,35 with a reduction in tryptophan, the precursor of serotonin, in both cerebral spinal fluid35 and plasma.36

More broadly, increased IFN-I signaling is relevant to a set of inter-related findings regarding the role of immune system in the pathophysiology of MDD (reviewed by Zunszain et al.37) (i) glucocorticoid dysregulation: hypersecretion of cortisol (stimulated by corticotropin releasing factor) is observed in MDD, possibly caused or mediated by glucocorticoid receptor resistance.37 (ii) Immune/inflammatory dysregulation: levels of circulating inflammatory cytokines are increased, particularly tumor necrosis factor alpha (TNF-α) and interleukin 6 (IL-6).38, 39, 40 Also, plasma levels of a non-specific inflammation marker, C-reactive protein, predicted depression in a Danish sample of over 73,000 individuals.41 (iii) Hippocampal volume is reduced with longer durations of MDD,42 due either to cell loss (implicating inflammation-related apoptosis37,43) or to reduced cellular volume.37

Existing studies suggest multiple hypotheses concerning the causal relationships among these findings. An initiating factor could be stress-induced glucocorticoid secretion with resulting neuroinflammation, cytokine production and apoptosis.44 Alternatively, dysregulated cytokine production could directly cause depression (perhaps mediated by tryptophan depletion) and also increase glucocorticoid receptor resistance.45 Indeed, it has been proposed that sequence variants that predispose to depression in modern society may have been useful ancient adaptations to pathogens.46 It is reasonable to assume that there are multiple causal inter-relationships, reflecting a high degree of phenotypic and etiologic heterogeneity in MDD.

The finding of increased expression levels of IFN-I signaling pathway in this study is consistent with all of the above hypotheses. It could represent a key causal or mediating factor, and/or the end result of a complex process. However, a third possibility is that increased expression of IFN-I signaling is caused by a confounding factor that is downstream of or simply correlated with MDD. We did not find genetic variants impacting IFN pathway genes (either protein-coding variants or known eQTLs) to be significantly associated with MDD in this cohort. We note that there are multiple IFN-I gene subtypes in a cluster on chromosome 9, each of which had too few uniquely-mapped reads to meet the criterion for inclusion in the analyses for this study. They have highly homologous sequences, so that many sequencing reads in this region were excluded here because we counted only uniquely mapped reads.

We also should not overlook the possibility that, as the primary function of the IFN-I signaling system is response to viral infection, a subset of MDD patients could have unrecognized chronic and/or active infections, whose effects could be influenced by many genetic sequence variants and by other environmental factors. This hypothesis cannot be directly addressed by the present data but suggests an interesting future research direction.

Comparison with previous studies

MDD transcriptome studies of postmortem brain tissue have suggested alterations in glutamatergic and GABA-ergic pathways47 and in synaptic genes relevant to the anatomical finding of reduced dendritic spines.48 Interpretation is complicated by confounding factors (postmortem changes and drug exposures) and small samples (typically fewer than 30 cases).

Previous blood cell gene expression studies—typically targeting 1–15 genes in samples of 25–100 MDD cases—have reported expression increases for cytokine- and glucocorticoid-related genes, and reductions for glucocorticoid receptor and neuroplasticity-related genes such as BDNF.49 One study reported increased blood expression levels of IFN-α1, IFN-α2 and IFN-β1 in 22 MDD cases and 11 controls.32 There are two reports from a larger study that included microarray whole-transcriptome and GWAS data for 215 MDD cases and their relatives (San Antonio Family Heart Study): RNF123 was the most significant gene in a bivariate linkage analysis of MDD and gene expression values (P>0.5 here);50 and DISC1 SNPs showed genotypic association to MDD51 (P=0.025 for DISC1 here, but with increased expression, whereas previous studies predict reduced expression of DISC1). Whole-transcriptome association results have not been reported to date for that study. Interestingly, a recent study of subjective social isolation reported dysregulation of pro- and antiinflammatory genes, including downregulation of IFN I signaling pathway in lonely individuals;52,53 we note that the impact of other unmeasured factors that modulate the IFN response (for example, age differences in the cohorts) may underlie the apparent discrepancy in the direction of effect of IFN I signaling.

Meta-analysis of studies of serum cytokine levels in MDD suggests significant increases in TNF-α and IL-6;38 here, overexpression of TNFRSF10B was the thirteenth strongest individual gene finding (Table 2), whereas overexpression of IL-6 was not observed, although the strongest IL-6 findings have been in cerebral spinal fluid.35 There have been studies of IFN-λ (without significant findings overall38), but not type I IFNs. We were unable to measure serum biomarkers in the present study because the study was designed to collect a large national sample, and this design did not permit obtaining and then immediately centrifuging and preserving serum.

CINP and IFN

Joint analysis of association of MDD to expression levels and to genotypes detected an association with CINP. The antiviral component of IFN signaling results in cell cycle arrest and inhibition of cell growth by inhibiting cyclin-dependent kinases,9 and specifically CDK2.54 CINP interacts with CDK2 as a component of the DNA damage integrity checkpoint, and its suppression has been linked to cell cycle arrest.8 As noted above, however, support for this association is very modest in the larger PGC MDD cohort.

Strengths and limitations

Strengths of this study design include deep RNA-seq data; a large population-based sample with a representative female:male ratio (2:1), including medicated and unmedicated cases; a relevant tissue type for immunological mechanisms; and correction for multiple measured and unmeasured covariates. Limitations include use of non-brain tissue; the use of mixed white blood cell types (as opposed to cell-sorted data) and not having cell count measurements; and case-control differences in many physiological covariates (Table 1 and Supplementary Table S2). The detection of a single associated pathway, despite an excess of modestly significant P-values, also suggests that (as in GWAS) larger samples are needed. Our reliance on log-linear models to account for covariates could have left residual correlation due to non-linear effects. Also, we only adjusted for covariates relevant to at least 30 subjects in the primary analysis. Under-reporting of medical information by cases could have produced spurious results given our reliance on self-report (although none of those variables were associated with IFN-I signaling). IFN-I signaling could have been influenced by common viral infections unrelated to MDD, although these are seasonal and case status was not related to order of blood collection. We used two statistical estimates of white cell-type proportions based on expression signatures from microarray data, but these estimates might not have been sufficiently accurate, and they were based on microarray measurements, whereas our data is based on RNA-seq.

Finally, results were partially dependent on our approach to removing hidden confounding factors by accounting for expression PCs, which has been shown to produce more accurate estimations of eQTLs21,55 and of functional gene interactions.22 To avoid over-fitting, we removed the top 10 PCs based only on explained variance (Supplementary Figure S6), ignoring MDD status.

In a post hoc assessment, we do observe weaker (non-significant) associations between the IFN pathway and MDD status without removing the effect of top expression PCs. Although this indicates some sensitivity of our analysis to modeling choices, we have observed that the top expression PCs are correlated with confounding factors including age (Supplementary Figure S9), and thus it is likely that the chosen procedure yields more meaningful results.

Conclusion

We identified a significant association between MDD and increased expression of genes involved in IFN α/β signaling. This supports hypotheses of dysregulated cytokine activity in MDD, but gene expression data cannot resolve critical cause–effect relationships. Increased IFN-I signaling could be a direct cause of depression, resulting from some combination of genetic sequence variants, psychological stress, or normal or abnormal responses to unknown viral infection or other physiological stressors. Increased signaling could also be downstream of depression, or simply based on correlation of IFN-I signaling with unknown confounding variables that are correlated with depression. There are also plausible hypotheses (with several possible directionalities of effects) involving an interaction between immune dysregulation and increased resistance to the release of glucocorticoids. There is a need for studies that simultaneously measure clinical variables, genotypes, gene expression, viral sequences or antibodies, and proteins related to immune function and glucocorticoid dynamics.

Data access

Data from this study are available to qualified scientists from the NIMH Center for Collaborative Genomic Studies on Mental Disorders, including raw and aligned RNA-seq data, raw and normalized read counts per gene, and covariate data. Scientists who are interested in requesting data should consult https://www.nimhgenetics.org/access_data_biomaterial.php, which provides specific instructions about how to request data, including the name and e-mail address of the program official who is responsible for handling requests. When inquiring about data access, please reference this study as the Depression Genes and Networks study (DFL, PI).