Blood-based gene expression signatures of medication-free outpatients with major depressive disorder: integrative genome-wide and candidate gene analyses

Several microarray-based studies have investigated gene expression profiles in major depressive disorder (MDD), yet with highly variable findings. We examined blood-based genome-wide expression signatures of MDD, focusing on molecular pathways and networks underlying differentially expressed genes (DEGs) and behaviours of hypothesis-driven, evidence-based candidate genes for depression. Agilent human whole-genome arrays were used to measure gene expression in 14 medication-free outpatients with MDD who were at least moderately ill and 14 healthy controls matched pairwise for age and sex. After filtering, we compared expression of entire probes between patients and controls and identified DEGs. The DEGs were evaluated by pathway and network analyses. For the candidate gene analysis, we utilized 169 previously prioritized genes and examined their case-control separation efficiency and correlational co-expression network in patients relative to controls. The 317 screened DEGs mapped to a significantly over-represented pathway, the “synaptic transmission” pathway. The protein-protein interaction network was also significantly enriched, in which a number of key molecules for depression were included. The co-expression network of candidate genes was markedly disrupted in patients. This study provided evidence for an altered molecular network along with several key molecules in MDD and confirmed that the candidate genes are worthwhile targets for depression research.

Overabundance plot showing the number of gene probes (among the total 28,439 probes) that were differentially expressed between patients and controls as determined by the moderated t-test.
Volcano plot of differentially expressed genes between patients and controls.
The x-axis represents the logarithm base 2 of the fold change of expression in patients relative to controls. The y-axis represents the negative logarithm base 10 of the p value for the moderated t-test comparing expression between patients and controls. Gene probes located in the right arm of the volcano are up-regulated in depression and those located in the left arm are down-regulated. Vertical green lines indicate the thresholds of fold change (i.e., 1.5 and 1/1.5). A horizontal green line indicates the significance threshold of p value (i.e., 0.01). Thus defined, blue dots in the upper-right and upper-left corners represent those gene probes (n = 317) that passed the hybrid thresholds, and the remaining grey dots represent those probes that did not pass one or both of the thresholds.  Principal component analysis plots for expression profiles of the total 28 subjects, using the 317 probes with p less than 0.01 and absolute fold change greater than 1.5.
The first two principal components are presented.
As expected, separation efficiency was increased compared to that using the entire probes ( Figure S2).
Red triangle: Depressed patient; Blue square: Healthy control subject Supplementary Figure S4 317 probes

Control subject
The dendrogram was created by the clustering procedure described below. Sample ID was defined in Supplementary Table S1. The bottom panel shows the expression values for each sample by each probe arranged vertically. Distances between samples were determined by the Euclidean distance as a measure of similarity combined with the average linkage method.
Four patients with overall low expression patterns were first isolated, followed by one control subject with overall high expression pattern. The remaining 23 samples were subdivided into two large clusters, with the left one consisting mostly of controls and the right one patients. One patient and four control subjects were misclassified, resulting in 23/28 (82.1%) correctly classified samples.
Color range (expression value)  This figure indicates that the number of observed differences in candidate gene probes (bumpy line) exceeded not only the number of differences expected by chance (dotted line) but also the number of differences using the total probe (curved line) albeit to a slight extent.
Supplementary Figure S6 Principal component analysis plots for expression profiles of the total 28 subjects, using the candidate gene probes for depression.
The first two principal components are presented.
(a) Using the 183 candidate gene probes. Separation efficiency was somewhat greater than when using the entire probes ( Figure S2).
Separation efficiency was increased, suggesting that relatively small number of candidate genes may be useful in discriminating depressed patients from controls. The identified differentially expressed genes (DEGs) were subjected to a sequence of bioinformatics analyses comprising gene ontology, pathway/network analysis, protein-protein interaction and literature mining. These informatics analyses were conducted using publicly available databases. To understand the biological significance and evaluate the statistical enrichment (ie. over-representation) of the DEGs, we first conducted gene ontology (GO) term enrichment analysis using the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 functional enrichment tool 1 . This analysis, along with the subsequent pathway enrichment analysis, was performed separately for the up-regulated and down-regulated genes based on previous studies 2, 3 . DAVID functional annotation clustering uses an algorithm to classify the similar, redundant and heterogeneous annotation contents from the same or different resources into annotation groups. We focused on GO FAT terms for biological processes ("GOTERM_BP_FAT") so that too broad GO terms can be filtered out based on a measured specificity of each term. Only the GO terms with a conservative EASE score (a modified Fisher's exact test pvalue) less than 0.01 were considered significant. Clearly overlapping terms were collapsed into one term so that only the term with the lowest p value was retained.
DAVID was then used to identify over-represented pathways. Specifically, the DEGs uploaded to DAVID were mapped onto Biocarta 4 , Kyoto Encyclopedia of Genes and Genomes (KEGG) 5 , and Reactome 6 pathway databases, separately for the up-regulated and down-regulated genes.
To construct protein-protein interaction network and analyze its enrichment, the DEGs were submitted to the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) v9.1 software 7 . Finally, literature mining was performed to scrutinize the top 20 genes of the DEGs list in terms of their relevance to depression and central nervous system. To this end, we used Chilibot 8 , a literature analysis software that identifies co-occurrence of names of a given gene (or genes) and a given keyword. The search term "depression" or "central nervous system" was used as the keyword. Each literature obtained by Chilibot was manually checked to ensure that the search term was used to describe what we meant (for example, literature on "long-term depression", a lasting change in synaptic transmission, was excluded). In addition, manual PubMed literature searches were conducted on the same 20 genes to check for other important evidence for their involvement in the brain.