Investigating the potential anti-depressive mechanisms of statins: a transcriptomic and Mendelian randomization analysis

Observational studies and randomized controlled trials presented inconsistent findings on the effects of cholesterol-lowering statins on depression. It therefore remains unclear whether statins have any beneficial effects on depression, and if so, what the underlying molecular mechanisms are. Here, we aimed to use genomic approaches to investigate this further. Using Connectivity Map (CMap), we first investigated whether statins and antidepressants shared pharmacological effects by interrogating gene expression responses to drug exposure in human cell lines. Second, using Mendelian randomization analysis, we investigated both on-target (through HMGCR inhibition) and potential off-target (through ITGAL and HDAC2 inhibition) causal effects of statins on depression risk and depressive symptoms, and traits related to the shared biological pathways identified from CMap analysis. Compounds inducing highly similar gene expression responses to statins in HA1E cells (indicated by an average connectivity score with statins > 90) were found to be enriched for antidepressants (12 out of 38 antidepressants; p = 9E-08). Genes perturbed in the same direction by both statins and antidepressants were significantly enriched for diverse cellular and metabolic pathways, and various immune activation, development and response processes. MR analysis did not identify any significant associations between statin exposure and depression risk or symptoms after multiple testing correction. However, genetically proxied HMGCR inhibition was strongly associated with alterations in platelets (a prominent serotonin reservoir) and monocyte percentage, which have previously been implicated in depression. Genetically proxied ITGAL inhibition was strongly associated with basophil, monocyte and neutrophil counts. We identified biological pathways that are commonly perturbed by both statins and antidepressants, and haematological biomarkers genetically associated with statin targets. Our findings warrant pre-clinical investigation of the causal role of these shared pathways in depression and potential as therapeutic targets, and investigation of whether blood biomarkers may be important considerations in clinical trials investigating effects of statins on depression.

that are not derived from their primary target tissue types, provided that their targets are expressed in the cell lines [1]. We queried the Protein Atlas [3] and DepMap expression platforms [4], and confirmed that HMGCR, the primary target of statins, was ubiquitously expressed in diverse human tissues and cell lines, including kidney and brain. Detailed analysis of statin gene expression signatures was performed on HA1E signatures generated in selected experimental conditions (time = 24 h, dose = 10 µM). The gene expression signatures of antidepressants in HA1E cells were also retrieved to enable analysis of the differentially expressed genes shared by both statins and antidepressants. The gene expression signatures of alvespimycin (heat shock protein inhibitor) [5] and sirolimus (mTOR inhibitor) [6] were analyzed as controls, as they were reported by the CMap team to induce consistent and strong transcriptomic responses across cell lines, and are not functionally linked to strong anti-depressive effects. Limited evidence suggests a cholesterol-modifying effect of sirolimus [7,8], with little evidence on the effect of alvespimycin on lipids.

CMap NPC signatures
NPC cells are undifferentiated progenitor cells that give rise to glial and neuronal cell types in the central nervous system [9]. In this study, as a sensitivity analysis, we analyzed the CMap gene expression signatures of statins in NPC cells as they are biologically more relevant to depression.
Previous analysis of the CMap data reported that out of a total of 189 compounds that induced consistent gene expression changes (indicated by high connectivity scores) in neuronal cell lines, 34% showed similar gene expression signatures between the NPC cells and the nine core cancer cell lines (details on core cell lines presented below), compared to 50% that showed similar gene expression signatures amongst the nine cancer cell lines [2]. This indicates that while for some compounds, their gene expression signatures from the nine cancer cells can be used as proxy for their signatures in brain cell lines, a large proportion of compounds may induce different gene expression changes in neuronal cell lines [2]. However, while these differences may be due to biological variations between neuronal and cancer cells, it remains unclear how well NPC cells proxy for drug response in neurons. It is also important to note that NPC cells are less well profiled compared to the nine core cancer cell lines, and unlike the generally homogenous cancer cell lines, NPC cells are highly heterogenous cell populations with greater diversity in molecular and transcriptomic characteristic [9].

CMap connectivity scores
In this study, to compute the connectivity profile of statins using the CMap platform, we The CMap platform uses an algorithm based on the weighted Kolmogorov-Smirnov enrichment statistic to quantify the similarity between a query signature and a reference CMap signature where the genes are ranked from the most up-regulated (high ranking) to the most down-regulated (low ranking) [2] (Supplementary Figure 1). The degree of connectivity is computed as Tau scores, which represent a statistical measure of the likelihood of observing the similarity between the query and reference signatures, given all transcriptomic signatures in the reference database. The Tau scores range between -100 and 100. A positive Tau score indicates positive connectivity, meaning that genes up-regulated in the query signature predominantly have high ranking in the reference signature, and genes down-regulated in the query signature predominantly have low ranking in the reference signature (Supplementary Figure 1). In contrast, a negative Tau score indicates negative connectivity, meaning that the query gene expression signature is essentially 'reversed' in the reference signature, i.e. up-regulated genes in the query signature predominantly have low ranking in the reference signature, and down-regulated genes in the query signature predominantly have high ranking in the reference signature (Supplementary Figure 1). For example, one would expect an agonist and an antagonist for the same target to have high negative Tau scores.
A subset of the CMap dataset, termed the "Touchstone" collection, contain gene expression signatures of over 8,000 well-annotated chemical and genetic perturbagens, profiled in a "core" set consisting of nine cancer cell lines (A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, and VCAP). The Touchstone reference dataset is used in the CLUE platform as the reference set against which queries are compared to compute connectivity profiles. For each query signature, the CLUE algorithm computes a connectivity profile for each of the nine core cancer cell lines (Supplementary Figure 1). For example, a connectivity profile from HA1E cells indicate the connectivity between the query signature and the reference gene expression signatures generated by exposing HA1E cells to the perturbagens. CLUE also computes a summary connectivity profile, as described by Subramanian et al. [2], which indicates the connectivity between the query signature and the reference signatures summarized across cell-type-specific connectivity scores (Supplementary Figure 1).

Transcriptome-wide pairwise correlation
We also interrogated the correlation between a pair of signatures for all 12 328 genes, using Pearson correlation in R.

Annotation of biological process terms with ancestor terms
Gene Ontology (GO) biological process terms are placed in a hierarchical structure consisting of parent/child term relationships [10]. While functional annotation by gProfiler2 provides detailed information on the enriched biological processes, the resulting list of significant terms often contains overlapping and highly redundant biological processes, confounding the interpretation of results. We thus summarized the GO biological process terms based on their high-level ancestor terms. Using GO.db (version 3.13.0) [11], we retrieved the child terms of biological process (GO:0008150), which is the root term of all GO biological pathway terms, and defined these child terms as the list of ancestor biological process categories used for downstream annotation. We annotated each statistically significant GO biological process term based on their ancestor categories. Each GO term might be annotated with more than one ancestor category. Similarly, we also categorized the GO biological terms into primary metabolic process (GO:0044238) (a child term of "Metabolic process") and immune system process (GO:0002376) terms.

Annotating antidepressants
The Anatomical Therapeutic Chemical (ATC) classification system provides a reference for drugs, which are hierarchically categorized based on their targeted organs as well as their therapeutic, pharmacological and chemical attributes [12]. While we acknowledge that there are non-ATCprofiled drugs or active substances with anti-depressive properties, in this study we limited our analysis to antidepressants documented by the ATC system (ATC code: N06A) as they were more likely to have established efficacy and thereby provided a valid basis for interrogating the molecular mechanisms underlying their anti-depressive effects. A total of 38 ATC-documented antidepressants are profiled by the CMap database (Supplementary Table 3).

Identifying shared perturbed pathways between statins and antidepressants
We hypothesized that if statins and antidepressants indeed exhibited shared pharmacological effects, they would likely perturb the same biological pathways. Out of the 38 antidepressants profiled in CMap, we selected five antidepressants (desipramine, nortriptyline, paroxetine, sertraline and trimipramine) for further analysis as they demonstrated the highest average connectivity with the six statins studied in HA1E cells. We performed pairwise comparison of the transcriptomic signatures of six statins against the five antidepressants, which gave rise to a total of 30 statin-antidepressant combinations. For each pair of statin and antidepressant, genes that were differentially expressed (defined by |z| > 1) in both signatures were further categorized into genes perturbed in the same or opposite direction. Using gProfiler2 and parameters described in Methods, pathway enrichment analysis was performed for each category. Similarly, we also performed pairwise pathway enrichment analysis for genes commonly perturbed by statins and the two control drugs (alvespimycin and sirolimus).

Mendelian randomization
The principle underlying Mendelian randomization (MR) originates from Mendel's laws of segregation and independent assortment. By Mendel's laws, alleles randomly segregate and are passed from parents to offspring during the process of meiosis, and the inheritance of alleles of one variant occurs independently of alleles of other variants [13]. In randomized controlled trials, participants are randomly allocated to the treatment or control groups, and thereby subjected to different levels of exposure. MR uses genetic instruments linked to the exposure of interest as proxies, as if the level of exposure is randomly allocated at conception. This random assignment of genetically determined exposure at conception is less likely to be affected by various lifestyle and socioeconomic factors, which may otherwise distort the exposure-outcome associations [14,15]. Furthermore, as the genetic determinants for exposures are generally fixed at conception and thus not affected by the outcome, MR is less prone to reverse causation, which may be problematic to dissect using observational studies [16]. Genetic variants are only valid instrument variables for MR analysis if they meet three key assumptions, that they are strongly associated with the exposure, not associated with any confounder of the exposure-outcome association, and are associated with the outcome only through the exposure of interest (no horizontal pleiotropy) [17].
The selection of genetic instruments depends on the genetic architecture of the exposure [18]. For polygenic traits, such as low-density lipoprotein cholesterol (LDL-C) levels, the genetic proxy for these traits is derived from genetic variants from multiple trait-associated loci [18]. In contrast, in cases where the exposure is a protein, such as HMGCR, its genetic instruments may be selected from loss-of-function mutations, expression quantitative trait loci (eQTL) or protein quantitative trait loci (pQTL) located within or nearby the protein-encoding gene [18]. Previous studies have reported an enrichment of eQTLs in single nucleotide polymorphisms (SNP) associated with complex traits, and suggested gene expression as an important mediator of SNP-trait associations, supporting the utility of eQTLs in identifying genetic determinants of traits and diseases [19,20].
Furthermore, eQTLs are widely used as genetic proxies for drug targets in MR analyses [21,22].
In this study, we assessed the strength of genetic instruments using the F-statistics from the linear regression model of the genetic instruments with the exposure (gene expression). We determined SNPs with F-statistic > 10 to be suitable genetic instruments, where F-statistic was calculated as previously described [18,21].

eQTLGen blood eQTLs
The eQTLGen consortium has produced an extensive catalogue of both cis and trans eQTLs for human gene expression in blood [23]. The blood eQTL data were generated via meta-analysis of over 31 684 samples from 37 cohorts consisting of primarily European-ancestry individuals [23].
In the current study, only cis-eQTLs were considered for the selection of genetic instruments.

PsychENCODE eQTLs
To use genetic instruments relevant to the biology of depression, we retrieved brain eQTL data produced by the PsychENCODE consortium. The PsychENCODE eQTL dataset contains prefrontal cortex eQTLs identified in a cohort of 1 387 individuals [24].

Genetic instrument for HMGCR inhibition
To investigate the on-target effect of statins, we selected rs12916 as the genetic proxy for HMGCR inhibition. The rs12916 SNP is located in the 3¢ untranslated region of the HMGCR gene, and the genetic associations between the rs12916-T allele and metabolomic traits were shown to mirror the actual effects of statin exposure on the metabolome [25]. It is widely used as a genetic proxy for assessing statin effects [26,27,28]. Rs12916 is a strong eQTL for HMGCR expression, where each additional rs12916-T allele is associated with 0.1 standard deviation (SD) decrease in HMGCR expression in blood (eQTLGen: p = 1.5E-36). As a sensitivity analysis, we also selected rs17671591, a strong eQTL (p = 2.5E-05) for HMGCR expression in the brain prefrontal cortex (PsychENCODE [24]), as the genetic instrument for HMGCR inhibition in the brain. Rs17671591 is in moderate linkage disequilibrium (LD) (r 2 = 0.6) with rs12916 amongst European-ancestry individuals.

Statin off-target inhibition of ITGAL and HDAC2
We used the DrugBank database [29] to identify off-target effects of statins, and found that selective statins exhibited in vitro inhibition of Integrin Alpha-L (ITGAL) and Histone Deacetylase 2 (HDAC2). Although atorvastatin has been reported to exhibit additional off-target effects (namely DPP4, AHR and NR1I3) [29], these off-target effects are unique to atorvastatin, and thus were not investigated in this study.
ITGAL is a component of the heterodimeric Lymphocyte Function-associated Antigen-1 (LFA-1) receptor, which is expressed in diverse immune cell populations and plays a role in modulating cell adhesion [30,31]. Amongst the statins investigated in this study, lovastatin, rosuvastatin and simvastatin are found to show in vitro binding to ITGAL, subsequently inhibiting LFA-1 function and T-cell adhesion, proliferation, and cytokine production [32,33,34]. At the same time, a subset of statins (atorvastatin, fluvastatin, lovastatin, pravastatin and simvastatin) are also found to exhibit in vitro inhibition of HDAC2 [35], which is involved in chromatin remodeling [36].
Using the eQTLGen data [23], we selected the most significant eQTLs in blood to proxy for inhibition of ITGAL and HDAC2. Specifically, each additional rs11574938-C allele was associated with 0.21 SD decrease of ITGAL expression (p = 7.9E-150), and each rs9481408-T allele was associated with 0.048 SD decrease of HDAC2 expression (p = 4.1E-07) in blood.

Genetic instrument for PCSK9 inhibition
We sought to investigate whether any genetic associations observed for statin targets were likely to be mediated independently of cholesterol lowering, by extending MR analysis to other lipidlowering medications, namely Proprotein Convertase Subtilisin/Kexin type 9 (PCSK9) inhibitors [37]. PCSK9 eQTLs were absent in the eQTLGen dataset, as PCSK9 was likely removed from meta-analysis due to the lack of variation in expression. We thus identified PCSK9 eQTLs from the GTEx eQTL dataset. The GTEx project (version 8) identified and profiled eQTLs from diverse tissue types [38]. The majority of the GTEx cohort consists of individuals of European ancestry.
We retrieved PCSK9 eQTLs identified in whole blood, which contains 670 samples, and performed MR analysis using a strong eQTL (rs12117661; p = 8.3E-11) as the genetic instrument for PCSK9 expression. This eQTL was previously found to be moderately associated with plasma PCSK9 protein expression (p = 0.00029) [39].

HEIDI test
To validate that the exposure-outcome association was mediated through one single causal SNP, rather than via LD between separate SNPs (linkage scenario), the Heterogeneity in Dependent Instruments (HEIDI) test was performed as a sensitivity analysis [40]. Without correcting for multiple testing, we used a HEIDI p-value threshold of 0.01 to define statistical significance, which was a stringent threshold. A HEIDI p-value > 0.01 indicated that the estimated effect of exposure on outcome was likely due to single causal variants, while a p-value < 0.01 suggested that the observed association was potentially attributable to the linkage scenario.

LocusCompare plots
LocusCompare plots allow the visualization of the distribution of genome-wide association studies (GWAS) or eQTL summary statistics [41]. Using the LocusCompareR package (version 1.0.0) [41], we generated the LocusZoom and significance (-log10P) scatter plots for LDL-C and HMGCR eQTLs from eQTLGen. The LD scores were generated based on the European-ancestry population.

Supplementary Figures
Supplementary Figure 1. Schematic illustration of statin connectivity profile generation. (A) For each perturbagen of interest (e.g. a chemical compound), its gene expression signature is generated by exposing human cell lines to the perturbagen. In this study, the gene expression signatures of statins were retrieved from the CMap dataset (GSE92742). To compute the connectivity profile of each statin signature, a query signature is contructed consisting of n most up-regulated (z-score > 1) landmark genes (referring to genes directly measured in the L1000 array) and n most downregulated (z-score < -1) landmark genes from the statin gene expression signatures. Using the online CLUE algorithm, the query signature is compared against every perturbagen gene expression signature from the Touchstone reference dataset. Reference signatures that show concordant gene expression changes to the query signature are assigned a positive connectivity (Tau) score, while reference signtures that show discordant gene expression changes to the query signature are assigned a negative connectivity score. (B) For each query signature, the CLUE algorithm computes cell-type-specific connecivity profiles, generated by comparing the query signature against the reference signatures profiled in each cell line seperately. By summarizing the connectivity profile across the nine core cell lines, CLUE also computes a summary connecivity profile.   Supplementary Figure 5. Genes functionally involved in lipid and immune pathways were widely perturbed by statin exposure in HA1E cells. A total of 366 biological process terms were identified as significantly enriched amongst genes up-regulated or down-regulated (|z| > 1) by at least one statin, and categorized into high-level ancestor terms. Y-axis shows the ancestor biological process terms, as well as the child terms of primary metabolic process (GO:0044238) (a child term of "metabolic process") and immune system process (GO:0002376). The arrows indicate ancestorto-child relationships of GO terms. Bar graph shows the percentages of significant GO biological process terms annotated with each ancestor terms, with the corresponding counts shown on the graph. The bubble plot shows the statin compounds for which the biological processes were identified as significantly enriched.
Supplementary Figure 6. Pathway enrichment results using more stringent z-score thresholds for defining differentially expressed genes. Pathway enrichment analysis was performed for differentially expressed genes, defined by (A) absolute z-score > 1.5 and (B) absolute z-score > 2, where a total of 324 and 237 biological pathways were identified as significantly enriched respectively. Enriched biological process terms were categorized into high-level ancestor terms. Y-axis shows the ancestor biological process terms, as well as the child terms of primary metabolic process (a child term of "metabolic process") and immune system process. The arrows indicate ancestor-to-child relationships of GO terms. Bar graph shows the percentages of significant GO biological process terms annotated with each ancestor term, with the corresponding counts shown on the graph. The bubble plot shows the statin compounds for which the biological processes were identified as significantly enriched.   Supplementary Figure 13. Pathway enrichment results of statin-induced differentially expressed genes in NPC cells. Pathway enrichment analysis was performed for up-and down-regulated genes, defined by (A) absolute z-score > 1, (B) absolute z-score > 1.5 and (C) absolute z-score > 2, where a total of 136, 69 and 21 biological pathways, respectively, were identified as significantly enriched. Enriched biological process terms were categorized into high-level ancestor terms. Yaxis shows the ancestor biological process terms, as well as the child terms of primary metabolic process (a child term of "metabolic process") and immune system process. The arrows indicate ancestor-to-child relationships of GO terms. Bar graph shows the percentages of significant GO biological process terms annotated with each ancestor term, with the corresponding counts shown on the graph. The bubble plot shows the statin compounds for which the biological processes were identified as significantly enriched.   Figure 16. LocusCompare plot of HMGCR eQTLs (eQTLGen) and LDL-C GWAS summary statistics. Each dot represents a SNP and is colored based on their degree of LD (r 2 ) with rs12916 (purple diamond). The LocusZoom plots (right panels) show the significance (-log10P) of SNPs against their genomic locations. The scatter plot (left panel) shows the SNP significance in the eQTL dataset plotted against the SNP significance in the LDL-C GWAS dataset.

Supplementary Tables
Supplementary Table 1