Multiple sclerosis genetic and non-genetic factors interact through the transient transcriptome

A clinically actionable understanding of multiple sclerosis (MS) etiology goes through GWAS interpretation, prompting research on new gene regulatory models. Our previous investigations suggested heterogeneity in etiology components and stochasticity in the interaction between genetic and non-genetic factors. To find a unifying model for this evidence, we focused on the recently mapped transient transcriptome (TT), that is mostly coded by intergenic and intronic regions, with half-life of minutes. Through a colocalization analysis, here we demonstrate that genomic regions coding for the TT are significantly enriched for MS-associated GWAS variants and DNA binding sites for molecular transducers mediating putative, non-genetic, determinants of MS (vitamin D deficiency, Epstein Barr virus latent infection, B cell dysfunction), indicating TT-coding regions as MS etiopathogenetic hotspots. Future research comparing cell-specific transient and stable transcriptomes may clarify the interplay between genetic variability and non-genetic factors causing MS. To this purpose, our colocalization analysis provides a freely available data resource at www.mscoloc.com.


Scientific Reports
| (2022) 12:7536 | https://doi.org/10.1038/s41598-022-11444-w www.nature.com/scientificreports/ When we further dissected the mapping of the ROI colocalization signals, we found a significant excess of intergenic and intron regions (as anticipated), as well as their prevalent distribution away from the transcription start site (TSS; Figure Supplement 1A). Notably, when we extended this analysis to GWAS data coming from other multifactorial diseases or traits, dividing immune-mediated and other complex conditions, we found highly comparable profiles (supplementary Fig. S1B-C, Additional File: Table S2), suggesting that the colocalization between MS-associated DNA intervals and intergenic or intronic sequences, plausibly referring to trRNA coding regions, is shared by the genetic architecture of most multifactorial disorders.
To consolidate this result and gain a deeper biological insight, we extended the colocalization analysis matching the ROI with a vast set of databases of regulatory DNA regions, including enhancers and superenhancers, derived from experiments on diverse tissue types (a total of 4,697,782 DNA regions, plausibly coding for trRNA, were extracted from a wide variety of raw data sources; referenced in Additional File: Table S3). To improve interpretability of the results through ranking, we implemented a harmonic score Table 1. Enrichment of MS-associated genetic variants in lists of T-cell transient transcripts extracted from Michel et al., 2017. The whole transcriptome list was split in two sub-lists depending on the transcripts' halflife: short (< 60′) and long (≥ 60′), respectively. Results are considered significant at p < 0.05 and are highlighted in bold.  Table S4). On another hand, we found a strong enrichment of MS-associated genetic variants in cell lines of hematopoietic lineage, including CD19 + and CD20 + B lymphocytes, CD4 + T helper cells, and CD14 + monocytes. This is in line with the GWAS data and the known immunopathogenesis of the disease, as well as with the fact that we consider a TT collection coming from a lymphoid line for the co-localization analysis. Moreover, among the significant hits, we found collections coming from brain resident cell, in particular microglial-specific enhancers, which is in line with recent reports on brain cell type-specific enhancer-promoter interactome activities, and the latest GWAS on MS genomic mapping 33,34 .
Non-relevant tissues serving as controls (such as kidney, muscle, glands, etc.) scored low in the ranking, crowding the bottom-left corner of Fig. 1 (grey dots; see also Additional File: Table S4).

Genetic and non-genetic factors for MS etiology converge in genomic regions plausibly coding for the transient transcriptome.
Independent studies support the fact that MS GWAS intervals are enriched with DNA binding regions (DBRs) for protein 'transducers' mediating non-genetic factors of putative etiologic relevance in MS, such as vitamin D deficiency or EBV latent infection 17,19 . Therefore, we further inquired whether DNA regions plausibly coding for trRNA would share these features (i.e., they colocalize with such DRBs). We set up 4 new ROIs corresponding to the DBRs for VDR, activation-induced cytidine deaminase (AID), EBNA2, and Epstein Barr nuclear antigen 3 (EBNA3C), chosen among viral or host's nuclear factors potentially associated to MS etiopathogenesis [35][36][37] . The DBRs for each nuclear factor were derived from recent literature (Additional File: Table S5) and matched with the GWAS-derived MS signals to confirm and expand previous results. We found statistically significant results for VDR, EBNA2, and AID for all the SNP position extensions (± 50, 100, 200 kb up-and down-stream), while for EBNA3C significant results came out at extension of ± 100 and 200 kilobases. This finding suggests that several DBRs can impact on the MS-associated DNA intervals through colocalization ( Table 2). Building once again on the work by Michel et al. 25 , we inquired whether there was a colocalization between genomic regions containing MS-associated variants, DBRs for VDR, EBNA2, EBNA3C, AID, and DNA intervals plausibly coding for trRNA. To this end, we considered the transient transcriptome that proved to be enriched with MS-associated variants (Table 1), and we then matched the corresponding coding regions with the DBRs for the four molecular transducers. For this analysis DBRs for EBNA2 (6880 regions), EBNA3C (3835 regions), AID (4823 regions), and VDR (23,409 regions), represented the ROI, while the ENCODE database of Transcription Factors Binding Sites served as Universe (13,202,334 regions; Fig. 2a). We report the results of this analysis in Table 3, which shows the significant colocalization of DNA regions plausibly coding for trRNAs with both MS-relevant GWAS signals, and DBR of 3 out of 4 factors active at nuclear level, and potentially associated with MS. The DBR for EBNA3C did not reach statistical significance, though it showed higher values of support for short half-life transcripts.
To review and confirm previous colocalizations, we considered the genomic regions resulting from the above reported match between the MS-associated GWAS intervals and the databases of regulatory DNA regions, containing enhancers and super-enhancers, plausibly enriched in trRNA-coding sequences ( Fig. 2 and the online data resource). We therefore matched these DNA regions with the DBR for VDR, EBNA2, EBNA3C and AID, finding significant enrichments that allow to contextualize and prioritize genomic positions, cell/tissue identity or cell status associated to MS. Considering the harmonic score obtained from these colocalization analyses, the top hits in EBNA2, EBNA3C, and AID involved lymphoid (CD19 + B cell lines and lymphomas; T regulatory cells; tonsils) and monocyte-macrophage lineages (peripheral macrophages; dendritic cells) from experiments included in the ENCODE, dbsuper, roadmapEpigenomics databases; however, also global collections of superenhancers/ enhancers and brain resident lineages appeared far from the bottom-left corner of Fig. 2 (the control datasets) ( Fig. 2A-C, see also Additional File: Table S6 and the online resource). Even though immune cells prevailed also in VDR top hits, a less stringent polarization was seen, somehow reflecting the wide-spreading actions of Table 2. Enrichment of MS-GWAS regions (at ± 50,100,200 kb range of extension) in lists (number in brackets in the right-most column) of DNA binding sites of human and viral molecular transducers; significant results (p < 0.05, corresponding to a − log (p) > 1.301) in bold. www.nature.com/scientificreports/ this transducer in human biology (Fig. 2D). However, with a more stringent cutoff of Harmonic Score > 40 that selects the most significant hits (Fig. Supplement 2), a core subset of MS-relevant cell lineages, shared across all four examined transducers, became evident (Additional File: Table S7).

A data resource for future research on transcriptional regulation in MS. A public web interface
for browsing the results of our colocalization analysis is freely available at www. mscol oc. com. This is a comprehensive genomic atlas disentangling specific aspects of MS gene-environment interactions to support further research on transcriptional regulation in MS. It includes the whole list of results derived from ROI, DBRs and database matches (Fig. 3a) across all performed experiments that yielded significant results. The user can navigate across the results and perform tailored queries searching and filtering for a variety of parameters, including MS-associated variant, DBR, experimental cell type, other match details (see Fig. 3b for all available search and filter modalities). Moreover, personalized HS, p-value, support and Odd Ratio threshold can easily be set to screen results, that are readily displayed in tabular format. To provide an example, we select "AID, EBNA2, EBNA3C, VDR" in the 'Matched DBR region (s)' panel and obtain the list of MS-associated SNPs (that proved to be enriched in genomic regions plausibly coding for trRNA) targeted by all four transducers (Fig. 3b,c).  (Table 3): ABC gene-enhancers connections were found for for 10 out of 84 hotspot SNPs, corresponding to 31 genes (Table 4 and Fig. 4). As expected from the pleiotropy of enhancer activity, many MS-trRNA hotspots were linked to multiple genes differentially regulated in distinct cell types: for example, the MS-trRNA hotspot in rs11026091 was linked to MRGPRE in T cells and MRGPRG-AS1 in B cells (see also Additional File: Table S9). Results included regulators of immune cell activity (MAP3K8, GIMAP8, TMEM176A, TMEM176B), ion channels and solute carriers (KCNH2, KCNMA1, SLC25A42), and transcriptional modulators (ICE2, SIN3B, NWD1).
Moreover, in most cases, the ABC-identified genes differed from the candidate genes reported in MS GWAS, underscoring the relevance of integrative approaches to annotate statistical genomic associations.

Discussion
Our study supports the hypothesis that investigations on the transient transcriptome may contribute to clarify how the GWAS signals affect the etiopathogenesis of MS and possibly of other complex disorders. Specifically, we show that genomic regions coding for the transient transcriptome recently described in T cells 25 , are significantly enriched for both MS-associated GWAS variants, as well as for DNA binding sites for protein 'transducers' of non-genetic signals, chosen among those plausibly associated to MS. The colocalization of GWAS intervals and some DNA-binding factors involved in MS etiology has already been reported [17][18][19] . Here we reinforce this premise and extend the result to AID, whose DBRs were not previously correlated to MS-associated genetic signals. The result is of relevance considering the role of AID in B cell biology and the high effectiveness of B cell-depleting approaches recently introduced in clinical practice to tackle the disease progression (Cencioni et al. 2021). Our colocalization analysis suggests a model in which trRNA-coding regions are hotspots of convergence between genetic ad non-genetic factors of risk/protection for MS. These hotspots are shared by two or more of the chosen transducers, indicating possible additive pathogenic effects or a multi-hits model to reach the threshold for MS development (see Fig. 3c and Additional File: Table S4). This model may reconcile previous evidences coming from ours and others' studies on MS etiology: genetic susceptibility plausibly exerts a soft effect (with the notable exception of the major histocompatibility complex variants, that are known to directly shape the repertoire of the (auto-)immune effectors); in fact, single base changes in GWAS loci could conceivably lead to subtle changes in TT expression, and twin studies in Mediterranean areas showed a disease concordance as low as 1 out of 10 identical twin pairs (Ristori et al. 2006). A likely higher weight has the non-genetic component, that seems to be multiple and heterogenous (with the notable exception of EBV, the most recurring and convincing risk factor for MS development; Ascherio et al., 2001), and that may favor stochastic events, by prevalently acting on genome regions coding for TT.
In homeostatic conditions, it can be hypothesized that DNA sequences coding for trRNA are composed of regulatory regions where genetic variability and non-genetic signals interact to finely regulate the gene expression according to cell identity, developmental or adaptive states, and time-dependent stimuli. As a matter of fact, the sequence variability of these regions and the strict time-dependence of their transcription could be instrumental to adaptive features; however, these same features make these regions susceptible to become dysfunctional or to www.nature.com/scientificreports/ be the targets of pathogenic interaction. In some instances, these detrimental interactions come from outside the cell, such as in the case of EBV interference with host transcription 38,39 , and the pathogenic consequences of vitamin D deficiency; in other cases, the dysfunction develops within the cell, such as the tumorigenic activity of AID in B cells 40,41 .
To support the relationship between trRNA and transcription of regulatory DNA regions, we matched a large dataset of enhancers and super-enhancers with MS-GWAS signals and DBR for VDR, EBNA2, EBNA3C and AID. The significant enrichment in cell lines and cell status coming from the hematopoietic lineages and the CNS-specific cell subsets corroborates data coming from recent reports showing the relevance of contextualizing and prioritizing the role of MS-associated GWAS signals 33,34,42,43 ). Our analysis supports the pivotal regulatory role of enhancer transcription (i.e., a main component of transient transcriptome) that was recently reported as not dispensable for gene expression at the immunoglobulin locus and for antibody class switch recombination 44 , though more research is needed to unravel such topic at a finer grain.
Reports on the dynamics of time-course data are a recent area of focus within the analysis of gene expression, specifically in immune cells. Although current studies use methods that investigate time points related to the stable transcriptome (RNA-seq performed with time spans of hours), they clearly show that gene expression dynamics may influence allele specificity, regulatory programs that seem to depend on autoimmune diseaseassociated loci, and different transcriptional profiles based on cell status after stimulation 45 . A recent work showed that an IL2ra enhancer, which harbors autoimmunity risk variants and was one of the first MS-associated loci from GWAS, has no impact on the gene level expression, but rather affects gene activation by delaying transcription in response to extracellular stimuli 46 . The importance of the timing in the gene expression control emerges also from several studies implicating enhancers and super-enhancers in the process of phase separation and formation of nuclear condensates, where the transcriptional apparatus steps-up to drive robust genic responses (Sabari et al., 2018). The overall process seems to be highly dynamic, with time spans of seconds or minutes, and hence compatible with the temporal features of the transient transcriptome, which could somehow contribute to the formation of these phase-separated condensates.
We suggest that studies on transient transcriptomes may integrate previous RNA-seq data in accounting for the interplay between genetic variability and non-genetic etiologic factors leading to MS development. Possible correlation between transient and persisting transcriptome obtained in ex-vivo and in-vivo experimental settings of neuroinflammation may help to better decipher the genomic regulatory syntax driven by non-coding DNA variants. In this context our results on 'hotspots' , MS-associated trRNAs, and those obtained in the paper describing ABC mapping (Nasser et al., 2021) are concordant in identifying regulated additional genes, besides those resulting from current interpretations on GWAS data (Table 4 and Additional File: Table S6), thus revealing a complex scenario in cell-specific gene-enhancers interaction that supports the need of a wider approach in characterizing plausibly causal genes.
Components of a more-complex-than-anticipated regulation of gene expression could include transcriptional noise, transitory time-courses, erratic dynamics, and highly flexibility of some DNA regions, possibly oscillating between bistable states of enhancer and silencer 47 . Our analysis provides a platform for future studies on transient transcriptome, which we support by making our data resource available at www. mscol oc. com. New gene regulatory models may emerge from this approach in order to better evaluate the meaning of GWAS in complex (ROI region of interest, DBR DNA binding regions, ENCODE TFBS transcription factor binding site). The figure shows the tracks we considered for the colocalization analyses. In brief, the ROI included the DBRs of MS-related viral and human transducers and was matched with MS-associated SNPs extended by 50, 100, and 200 kilobases that colocalize with regions plausibly coding for trRNAs (Database). As a control (Universe), we took from ENCODE the entire list of transcription factors binding sites. Results were considered significant if a colocalization was found across ROI and a Databases element without occurring in the Universe as a statistically significant match. (B-E), Colocalization results of EBNA2, EBNA3C, AID, VDR. The charts display results of all matches, i.e., with MS-associated SNPs and their extension at ± 50, 100, 200 kb. X-axis shows the Odd Ratio, y-axis shows the − log (pValue); dot size is proportional to to the Harmonic Score (HS). Thus, prioritized hits are represented by dots that occupy the upper-right area of the chart. Dots are coloured by cell type. Top-scoring hits in each subgroup are labeled; labels were arbitrarily designated according to the database of origin and the cell lineage where the enrichment occurred.

Methods
Data sources. Analyses were performed in Python and R. A data freeze was applied on 3/1/2020. All GWAS data was gathered from the GWAS Catalog through its REST API 32 ; about 1.5% of this data was filtered out as part of a QC process aimed at homogenizing legacy and more recent data. The MS GWAS regions were extracted from the overall GWAS Catalog data filtering by trait EFO_0003885. All Transcription Factor Binding Site regions (TFBS) were obtained from the ENCODE portal 49 . All data was organized in various databases and data pipelines as detailed below. A modular and parallel data pipeline was created to: (1) readily generate and evaluate all experiments in the paper, (2) manage and organize all data coming from various region collections (42,075 ROI regions; 4,697,782 regions plausibly coding for trRNAs; 13,309,757 Universe regions), multiple ROIs (MS GWAS, EBNA2, EBNA3C, VDR, AID, etc.), databases of vast background regions as they were populated with the data obtained from GWAS Catalog, ENCODE, and other raw data sources, (3) provide overlaps and intersection among various data elements, annotate them with the original MS GWAS loci that generated the signal, and (4) generate the overarching data resource available at www. mscol oc. com.
ABC functional mapping. The Activity-By-Contact model was applied to map genes regulated by selected MS-trRNA colocalization hotspots. Briefly, this model identifies gene-enhancers connection taking into account chromatin accessibility (ATAC-seq and DNase-seq experiments), histone modifications (H3K27ac ChIP-seq), and chromatin conformation (Hi-C) 50 . ABC analysis was performed using the ABC pipeline outputs for 131 cell types and tissues 51 . Gene-enhancers maps were produced through https:// fleks chas. github. io/ enhan cer-genevis/. Pathway and process enrichment analysis of mapped genes with the highest ABC score for each coloc region was performed through Metascape 52 , using the entire human genome as background and the following ontology    ), support, and Odds Ratio (OR) were combined into a single score inspired by the harmonic mean 54 and multi-objective optimization 55 with the formula below, where the spacing parameter k p was set to 10.0 and we consider all three contributors equally, setting therefore weights w i to 1.0. Statistical significance was taken at p < 0.05.
For pathway analysis in Metascape, enrichment p-values were calculated based through the accumulative hypergeometric distribution, q-values were calculated using the Benjamini-Hochberg method to account for multiple testing.

Data availability
The dataset supporting the conclusions of this article is available at the website www. mscol oc. com.