Multiple sclerosis genetic and non-genetic factors interact through the transient transcriptome

Umeton, Renato; Bellucci, Gianmarco; Bigi, Rachele; Romano, Silvia; Buscarinu, Maria Chiara; Reniè, Roberta; Rinaldi, Virginia; Pizzolato Umeton, Raffaella; Morena, Emanuele; Romano, Carmela; Mechelli, Rosella; Salvetti, Marco; Ristori, Giovanni

doi:10.1038/s41598-022-11444-w

Download PDF

Article
Open access
Published: 09 May 2022

Multiple sclerosis genetic and non-genetic factors interact through the transient transcriptome

Renato Umeton^1,2,3,4^na1,
Gianmarco Bellucci⁵^na1,
Rachele Bigi^5,6,
Silvia Romano⁵,
Maria Chiara Buscarinu^5,6,
Roberta Reniè⁵,
Virginia Rinaldi⁵,
Raffaella Pizzolato Umeton^7,8,9,10,
Emanuele Morena⁵,
Carmela Romano⁵,
Rosella Mechelli^11,12,
Marco Salvetti^5,13 &
…
Giovanni Ristori^5,6

Scientific Reports volume 12, Article number: 7536 (2022) Cite this article

55k Accesses
4 Citations
11 Altmetric
Metrics details

Subjects

Abstract

A clinically actionable understanding of multiple sclerosis (MS) etiology goes through GWAS interpretation, prompting research on new gene regulatory models. Our previous investigations suggested heterogeneity in etiology components and stochasticity in the interaction between genetic and non-genetic factors. To find a unifying model for this evidence, we focused on the recently mapped transient transcriptome (TT), that is mostly coded by intergenic and intronic regions, with half-life of minutes. Through a colocalization analysis, here we demonstrate that genomic regions coding for the TT are significantly enriched for MS-associated GWAS variants and DNA binding sites for molecular transducers mediating putative, non-genetic, determinants of MS (vitamin D deficiency, Epstein Barr virus latent infection, B cell dysfunction), indicating TT-coding regions as MS etiopathogenetic hotspots. Future research comparing cell-specific transient and stable transcriptomes may clarify the interplay between genetic variability and non-genetic factors causing MS. To this purpose, our colocalization analysis provides a freely available data resource at www.mscoloc.com.

Transcript specific regulation of expression influences susceptibility to multiple sclerosis

Article 13 January 2020

Maria Ban, Wenjia Liao, … Stephen Sawcer

Integration of epigenetic and genetic profiles identifies multiple sclerosis disease-critical cell types and genes

Article Open access 30 March 2023

Qin Ma, Hengameh Shams, … Jorge R. Oksenberg

Single-cell eQTL models reveal dynamic T cell state dependence of disease loci

Article 11 May 2022

Aparna Nathan, Samira Asgari, … Soumya Raychaudhuri

Introduction

A large body of literature agrees that regulatory genomic intervals, especially those encompassing enhancers, are enriched with disease-associated DNA elements. Most of this evidence comes from genome wide association studies (GWAS) based on single polymorphism nucleotides (SNPs) representing common variants^1,2,3,4,5, even though a recent study showed that low-frequency and rare coding variants may somewhat contribute to multifactorial diseases⁶. Several characteristics of regulatory disease-associated genetic variants complicate GWAS interpretation, prompting research on new gene regulatory models: (i) SNPs are chosen as haplotypes to spare the genotyping work needed for the large number of samples used in GWAS, therefore fine mapping and epigenetic studies are required to integrate GWAS data^7,8,9,10; (ii) a fraction of supposedly causal disease-associated variants directly alters recognizable transcription factor binding motifs as it might be expected, according to their regulatory function⁴; (iii) the identified GWAS signals are likely to exert highly contextual (i.e., time- and position-dependent) regulatory effects, that may change according to the tissue and to the time when they receive an input from inside or outside the cell. In summary, current gene regulatory models help only in part to fully detail which disease-associated SNP signals are causal, and by which exact mechanisms they are causal. Recent studies on the biological spectrum of human DNase I hypersensitive sites (DHSs), that are disease-associated markers of regulatory DNA, may help to better rework GWAS data and particularly to contextualize the genomic variants according to tissue/cell states and to gene body colocalization of DHSs¹¹. In this context, the latest version of the Genotype-Tissue Expression project may provide further insights into the tissue specificity of genetic effects, supporting the link between regulatory mechanisms and traits or complex diseases¹².

Such layers of complexity are in agreement with our previous investigations on the interplay between genetic and non-genetic factors contributing to multiple sclerosis (MS) development: twin pairs studies^13,14; disease modelling suggested stochastic phenomena (i.e., random events not necessarily resulting in disease in all individuals) contributing to the disease onset and progression^15,16; bioinformatics analyses determined a significant enrichment of binding motifs for Epstein-Barr virus (EBV) nuclear antigen 2 (EBNA2) and vitamin D receptor (VDR) in genomic regions containing MS-associated GWAS variants¹⁷. We also demonstrated that genomic variants of EBNA2 resulted to be MS-associated¹⁸, and other groups expanded our findings showing that enrichment of EBNA2-binding regions on GWAS DNA intervals is involved in the pathogenesis of autoimmune disorders, including MS¹⁹. The role of EBV proteins and/or VDR as key transcriptional regulators in MS falls within well-known sero-epidemiological evidences on the virus as risk factor for disease development²⁰, and on the vitamin deficiency associated to different disease prevalence in diverse geographic areas²¹. Recent works have even reinforced the EBV causal role and its mechanistic link for MS development^22,23.

A recent sequencing innovation (namely, TT-seq) allowed to map the transient transcriptome that has a typical half-life within minutes, compared to stable RNA elements, such as protein-coding mRNAs, long-noncoding RNAs, and micro-RNAs, that persists at least a few hours^24,25,26. The transient transcriptome (TT) includes mostly enhancer RNAs (eRNA), short intergenic non-coding RNAs (sincRNA) and antisense RNAs (asRNA). eRNAs are bidirectionally transcribed by mammalian genome regions having specific histone modifications to finely regulate chromatin conformation and transcription. Unlike promoters, enhancers can execute their functions regardless of orientation, position and spatial segregation from target genes that may be affected both in cis and in trans by eRNAs. Most sincRNAs are defined as genomic regions located within 10 kbp of a GENOME mRNA transcription start site. Overall, these transient RNAs (trRNA) are relatively short in length, generally lack a secondary structure, and would not present those chemical modifications that characterize unidirectional and polyadenylated stable RNAs^24,27. Other recent works based on time-resolved analysis, agree on the eRNAs very rapid functional dynamics model while interacting with the transcriptional co-activator acetyltransferase CBP/p300 complex^28,29. This confirms the highly contextual role of eRNAs through the control of transcription burst frequencies, which are known to influence cell-type-specific gene expression profiles³⁰. Along these lines, a recent study showed that T cells selectively filter oscillatory signals within the minute timescale³¹, further supporting the aforementioned model.

On these bases, we leveraged the recent sequencing innovations in the mapping of the transient transcriptome, in particular the work by Michel et al. on T lymphocytes (that are known to play a major role in MS pathogenesis), that applied both the TT-seq and the RNA-seq protocol in Jukart cells during their immediate response to the stimulation with ionomycin and phorbol 12-myristate 13-acetate (PMA). Michel's study allowed to compare the trRNAs and mRNAs with high temporal resolution, showing that TT-seq, but not RNAseq, caught rapid changes in transcriptional activity just after 15 min after stimulation^24,25. We hypothesized that MS-associated GWAS signals prevalently fall within regulatory regions of DNA coding for trRNAs. In theory, the genomic intervals coding for this transient transcriptome may be the hotspots where temporospatial occurrences may coalesce and so contribute to physiological (developmental and/or adaptive) outcomes, or possibly give rise to disease onset or progression. This study is aimed at verifying this working hypothesis through a colocalization analysis and its further dissection in the context of MS.

Results

MS-associated GWAS signals colocalize with regulatory regions of DNA plausibly coding for trRNAs

We set up our region-of-interest (ROI) inside GWAS catalogue³² by considering all MS GWAS that were published, extracting all SNP positions, and creating a single set of genomic coordinates that therefore encompass all GWAS-derived or GWAS-verified signal for MS. We then refined the SNP list by pruning out about 1.5% of the SNPs as they did not contain intelligible genomic annotations or were duplicates. The final ROI list is reported in Additional File: Table S1 and consists of 603 unique single-nucleotide regions; to provide a “threshold” against which the match ROI < > Database would be benchmarked, we used 107,423 regions as Universe, that corresponded to the signals coming from the entire GWAS Catalog.

Next, we matched through colocalization analyses our ROI with lists of regions resulting from the work by Michel et al., which mapped the transient and stable transcriptome captured by TT-seq after T cell stimulation²⁴. We found a significant enrichment of MS-associated genetic variants in the transient transcriptome (p-value = 2.80 × 10⁻⁹; Table 1). Of note, when we split the transcriptome list in two subsets for long (≥ 60 min) and short (< 60 min) half-life, we found that only the short half-life subset significantly colocalized with the ROI (p-value 2.06 × 10⁻⁸ vs. 0.09). This finding was indicative of the relationship between MS-associated GWAS signals and the regulatory regions of DNA coding for trRNAs.

Table 1 Enrichment of MS-associated genetic variants in lists of T-cell transient transcripts extracted from Michel et al., 2017.

Full size table

When we further dissected the mapping of the ROI colocalization signals, we found a significant excess of intergenic and intron regions (as anticipated), as well as their prevalent distribution away from the transcription start site (TSS; Figure Supplement 1A). Notably, when we extended this analysis to GWAS data coming from other multifactorial diseases or traits, dividing immune-mediated and other complex conditions, we found highly comparable profiles (supplementary Fig. S1B–C, Additional File: Table S2), suggesting that the colocalization between MS-associated DNA intervals and intergenic or intronic sequences, plausibly referring to trRNA coding regions, is shared by the genetic architecture of most multifactorial disorders.

To consolidate this result and gain a deeper biological insight, we extended the colocalization analysis matching the ROI with a vast set of databases of regulatory DNA regions, including enhancers and super-enhancers, derived from experiments on diverse tissue types (a total of 4,697,782 DNA regions, plausibly coding for trRNA, were extracted from a wide variety of raw data sources; referenced in Additional File: Table S3). To improve interpretability of the results through ranking, we implemented a harmonic score (HS), based on the Odd Ratio, the − log (p-value), and the support of each match. Statistically significant results came from sets included in SEA, seDB, dbSuper and other single lists of enhancers and non-coding RNAs derived from experiments on diverse tissue types, not necessarily belonging to the immune cells lineages (Fig. 1, black dots; Additional File: Table S4). On another hand, we found a strong enrichment of MS-associated genetic variants in cell lines of hematopoietic lineage, including CD19 + and CD20 + B lymphocytes, CD4 + T helper cells, and CD14 + monocytes. This is in line with the GWAS data and the known immunopathogenesis of the disease, as well as with the fact that we consider a TT collection coming from a lymphoid line for the co-localization analysis. Moreover, among the significant hits, we found collections coming from brain resident cell, in particular microglial-specific enhancers, which is in line with recent reports on brain cell type-specific enhancer-promoter interactome activities, and the latest GWAS on MS genomic mapping^33,34. Non-relevant tissues serving as controls (such as kidney, muscle, glands, etc.) scored low in the ranking, crowding the bottom-left corner of Fig. 1 (grey dots; see also Additional File: Table S4).

Genetic and non-genetic factors for MS etiology converge in genomic regions plausibly coding for the transient transcriptome

Independent studies support the fact that MS GWAS intervals are enriched with DNA binding regions (DBRs) for protein ‘transducers’ mediating non-genetic factors of putative etiologic relevance in MS, such as vitamin D deficiency or EBV latent infection^17,19. Therefore, we further inquired whether DNA regions plausibly coding for trRNA would share these features (i.e., they colocalize with such DRBs). We set up 4 new ROIs corresponding to the DBRs for VDR, activation-induced cytidine deaminase (AID), EBNA2, and Epstein Barr nuclear antigen 3 (EBNA3C), chosen among viral or host’s nuclear factors potentially associated to MS etiopathogenesis^35,36,37. The DBRs for each nuclear factor were derived from recent literature (Additional File: Table S5) and matched with the GWAS-derived MS signals to confirm and expand previous results. We found statistically significant results for VDR, EBNA2, and AID for all the SNP position extensions (± 50, 100, 200 kb up- and down-stream), while for EBNA3C significant results came out at extension of ± 100 and 200 kilobases. This finding suggests that several DBRs can impact on the MS-associated DNA intervals through colocalization (Table 2).

Table 2 Enrichment of MS-GWAS regions (at ± 50,100,200 kb range of extension) in lists (number in brackets in the right-most column) of DNA binding sites of human and viral molecular transducers; significant results (p < 0.05, corresponding to a − log (p) > 1.301) in bold.

Full size table

Building once again on the work by Michel et al.²⁵, we inquired whether there was a colocalization between genomic regions containing MS-associated variants, DBRs for VDR, EBNA2, EBNA3C, AID, and DNA intervals plausibly coding for trRNA. To this end, we considered the transient transcriptome that proved to be enriched with MS-associated variants (Table 1), and we then matched the corresponding coding regions with the DBRs for the four molecular transducers. For this analysis DBRs for EBNA2 (6880 regions), EBNA3C (3835 regions), AID (4823 regions), and VDR (23,409 regions), represented the ROI, while the ENCODE database of Transcription Factors Binding Sites served as Universe (13,202,334 regions; Fig. 2a). We report the results of this analysis in Table 3, which shows the significant colocalization of DNA regions plausibly coding for trRNAs with both MS-relevant GWAS signals, and DBR of 3 out of 4 factors active at nuclear level, and potentially associated with MS. The DBR for EBNA3C did not reach statistical significance, though it showed higher values of support for short half-life transcripts.

Table 3 Colocalization of human and viral transducer DBRs and MS-GWAS positions (at ± 50,100,200 kb range of extension) in DNA regions coding for transient transcripts; significant results (p < 0.05, corresponding to a − log (p) > 1.301) in bold.

Full size table

To review and confirm previous colocalizations, we considered the genomic regions resulting from the above reported match between the MS-associated GWAS intervals and the databases of regulatory DNA regions, containing enhancers and super-enhancers, plausibly enriched in trRNA-coding sequences (Fig. 2 and the online data resource). We therefore matched these DNA regions with the DBR for VDR, EBNA2, EBNA3C and AID, finding significant enrichments that allow to contextualize and prioritize genomic positions, cell/tissue identity or cell status associated to MS. Considering the harmonic score obtained from these colocalization analyses, the top hits in EBNA2, EBNA3C, and AID involved lymphoid (CD19 + B cell lines and lymphomas; T regulatory cells; tonsils) and monocyte-macrophage lineages (peripheral macrophages; dendritic cells) from experiments included in the ENCODE, dbsuper, roadmapEpigenomics databases; however, also global collections of superenhancers/enhancers and brain resident lineages appeared far from the bottom-left corner of Fig. 2 (the control datasets) (Fig. 2A–C, see also Additional File: Table S6 and the online resource). Even though immune cells prevailed also in VDR top hits, a less stringent polarization was seen, somehow reflecting the wide-spreading actions of this transducer in human biology (Fig. 2D). However, with a more stringent cutoff of Harmonic Score > 40 that selects the most significant hits (Fig. Supplement 2), a core subset of MS-relevant cell lineages, shared across all four examined transducers, became evident (Additional File: Table S7).

A data resource for future research on transcriptional regulation in MS

A public web interface for browsing the results of our colocalization analysis is freely available at www.mscoloc.com. This is a comprehensive genomic atlas disentangling specific aspects of MS gene-environment interactions to support further research on transcriptional regulation in MS. It includes the whole list of results derived from ROI, DBRs and database matches (Fig. 3a) across all performed experiments that yielded significant results. The user can navigate across the results and perform tailored queries searching and filtering for a variety of parameters, including MS-associated variant, DBR, experimental cell type, other match details (see Fig. 3b for all available search and filter modalities). Moreover, personalized HS, p-value, support and Odd Ratio threshold can easily be set to screen results, that are readily displayed in tabular format. To provide an example, we select “AID, EBNA2, EBNA3C, VDR” in the ‘Matched DBR region (s)’ panel and obtain the list of MS-associated SNPs (that proved to be enriched in genomic regions plausibly coding for trRNA) targeted by all four transducers (Fig. 3b,c). Through this approach we searched for MS-associated regions shared by the DBRs analyzed, and we were able to prioritize 275 genomic regions (almost half of the MS-associated GWAS SNPs) capable of binding at least 2 molecular transducers. These regions are ‘hotspots’ of interactions between genetic and nongenetic modifier of MS risk/protection: all four proteins (VDR, AID, EBNA2, EBNA3C) proved to target 24 regions, 3 of them 115 regions, and 2 of them 136 regions. A detailed legend and more example queries may be found on the online data resource website.

Finally, to obtain a functional mapping of MS-TrRNA regions, we attempted to identify MS-relevant genes by integrating our results with the ‘activity-by-contact’ (ABC) model (Fulco et al., 2019; Nasser et al., 2021), which was recently developed to define cell-specific, gene-enhancers connections according to chromatin conformation and accessibility, as well as to histone acethylation-methylation status. We retrieved a total of 77 gene-enhancers pairings (Additional File: Table S8), enriched in IL6-JAK-STAT3, IL-18, IL2RB pathways. Among these, we focused on MS variants-trRNA colocalization hotspots targeted by all four (AID, EBNA2, EBNA3C, VDR, n = 24) or three (AID, EBNA2, VDR, n = 60; see also Fig. 3c) molecular transducers, excluding EBNA3C, as it did not reach statistical significance in previous analysis (Table 3): ABC gene-enhancers connections were found for for 10 out of 84 hotspot SNPs, corresponding to 31 genes (Table 4 and Fig. 4). As expected from the pleiotropy of enhancer activity, many MS-trRNA hotspots were linked to multiple genes differentially regulated in distinct cell types: for example, the MS-trRNA hotspot in rs11026091 was linked to MRGPRE in T cells and MRGPRG-AS1 in B cells (see also Additional File: Table S9). Results included regulators of immune cell activity (MAP3K8, GIMAP8, TMEM176A, TMEM176B), ion channels and solute carriers (KCNH2, KCNMA1, SLC25A42), and transcriptional modulators (ICE2, SIN3B, NWD1).

Table 4 Activity-By-Contact (ABC) functional mapping of MS-trRNA hotspots bound by 4/4 MS-relevant transducers or only 3/4 (AID,EBNA2, VDR).

Full size table

Moreover, in most cases, the ABC-identified genes differed from the candidate genes reported in MS GWAS, underscoring the relevance of integrative approaches to annotate statistical genomic associations.

Discussion

Our study supports the hypothesis that investigations on the transient transcriptome may contribute to clarify how the GWAS signals affect the etiopathogenesis of MS and possibly of other complex disorders. Specifically, we show that genomic regions coding for the transient transcriptome recently described in T cells²⁵, are significantly enriched for both MS-associated GWAS variants, as well as for DNA binding sites for protein ‘transducers’ of non-genetic signals, chosen among those plausibly associated to MS. The colocalization of GWAS intervals and some DNA-binding factors involved in MS etiology has already been reported^17,18,19. Here we reinforce this premise and extend the result to AID, whose DBRs were not previously correlated to MS-associated genetic signals. The result is of relevance considering the role of AID in B cell biology and the high effectiveness of B cell-depleting approaches recently introduced in clinical practice to tackle the disease progression (Cencioni et al. 2021). Our colocalization analysis suggests a model in which trRNA-coding regions are hotspots of convergence between genetic ad non-genetic factors of risk/protection for MS. These hotspots are shared by two or more of the chosen transducers, indicating possible additive pathogenic effects or a multi-hits model to reach the threshold for MS development (see Fig. 3c and Additional File: Table S4). This model may reconcile previous evidences coming from ours and others’ studies on MS etiology: genetic susceptibility plausibly exerts a soft effect (with the notable exception of the major histocompatibility complex variants, that are known to directly shape the repertoire of the (auto-)immune effectors); in fact, single base changes in GWAS loci could conceivably lead to subtle changes in TT expression, and twin studies in Mediterranean areas showed a disease concordance as low as 1 out of 10 identical twin pairs (Ristori et al. 2006). A likely higher weight has the non-genetic component, that seems to be multiple and heterogenous (with the notable exception of EBV, the most recurring and convincing risk factor for MS development; Ascherio et al., 2001), and that may favor stochastic events, by prevalently acting on genome regions coding for TT.

In homeostatic conditions, it can be hypothesized that DNA sequences coding for trRNA are composed of regulatory regions where genetic variability and non-genetic signals interact to finely regulate the gene expression according to cell identity, developmental or adaptive states, and time-dependent stimuli. As a matter of fact, the sequence variability of these regions and the strict time-dependence of their transcription could be instrumental to adaptive features; however, these same features make these regions susceptible to become dysfunctional or to be the targets of pathogenic interaction. In some instances, these detrimental interactions come from outside the cell, such as in the case of EBV interference with host transcription^38,39, and the pathogenic consequences of vitamin D deficiency; in other cases, the dysfunction develops within the cell, such as the tumorigenic activity of AID in B cells^40,41.

To support the relationship between trRNA and transcription of regulatory DNA regions, we matched a large dataset of enhancers and super-enhancers with MS-GWAS signals and DBR for VDR, EBNA2, EBNA3C and AID. The significant enrichment in cell lines and cell status coming from the hematopoietic lineages and the CNS-specific cell subsets corroborates data coming from recent reports showing the relevance of contextualizing and prioritizing the role of MS-associated GWAS signals^33,34,42,43). Our analysis supports the pivotal regulatory role of enhancer transcription (i.e., a main component of transient transcriptome) that was recently reported as not dispensable for gene expression at the immunoglobulin locus and for antibody class switch recombination⁴⁴, though more research is needed to unravel such topic at a finer grain.

Reports on the dynamics of time-course data are a recent area of focus within the analysis of gene expression, specifically in immune cells. Although current studies use methods that investigate time points related to the stable transcriptome (RNA-seq performed with time spans of hours), they clearly show that gene expression dynamics may influence allele specificity, regulatory programs that seem to depend on autoimmune disease-associated loci, and different transcriptional profiles based on cell status after stimulation⁴⁵. A recent work showed that an IL2ra enhancer, which harbors autoimmunity risk variants and was one of the first MS-associated loci from GWAS, has no impact on the gene level expression, but rather affects gene activation by delaying transcription in response to extracellular stimuli⁴⁶. The importance of the timing in the gene expression control emerges also from several studies implicating enhancers and super-enhancers in the process of phase separation and formation of nuclear condensates, where the transcriptional apparatus steps-up to drive robust genic responses (Sabari et al., 2018). The overall process seems to be highly dynamic, with time spans of seconds or minutes, and hence compatible with the temporal features of the transient transcriptome, which could somehow contribute to the formation of these phase-separated condensates.

We suggest that studies on transient transcriptomes may integrate previous RNA-seq data in accounting for the interplay between genetic variability and non-genetic etiologic factors leading to MS development. Possible correlation between transient and persisting transcriptome obtained in ex-vivo and in-vivo experimental settings of neuroinflammation may help to better decipher the genomic regulatory syntax driven by non-coding DNA variants. In this context our results on ‘hotspots’, MS-associated trRNAs, and those obtained in the paper describing ABC mapping (Nasser et al., 2021) are concordant in identifying regulated additional genes, besides those resulting from current interpretations on GWAS data (Table 4 and Additional File: Table S6), thus revealing a complex scenario in cell-specific gene-enhancers interaction that supports the need of a wider approach in characterizing plausibly causal genes.

Components of a more-complex-than-anticipated regulation of gene expression could include transcriptional noise, transitory time-courses, erratic dynamics, and highly flexibility of some DNA regions, possibly oscillating between bistable states of enhancer and silencer⁴⁷. Our analysis provides a platform for future studies on transient transcriptome, which we support by making our data resource available at www.mscoloc.com. New gene regulatory models may emerge from this approach in order to better evaluate the meaning of GWAS in complex traits and the impact of the enhancer transcription⁴⁴, which was recently reported as an ancient and conserved, yet flexible, genomic regulatory syntax⁴⁸.

Methods

Data sources

Analyses were performed in Python and R. A data freeze was applied on 3/1/2020. All GWAS data was gathered from the GWAS Catalog through its REST API³²; about 1.5% of this data was filtered out as part of a QC process aimed at homogenizing legacy and more recent data. The MS GWAS regions were extracted from the overall GWAS Catalog data filtering by trait EFO_0003885. All Transcription Factor Binding Site regions (TFBS) were obtained from the ENCODE portal⁴⁹. All data was organized in various databases and data pipelines as detailed below. A modular and parallel data pipeline was created to: (1) readily generate and evaluate all experiments in the paper, (2) manage and organize all data coming from various region collections (42,075 ROI regions; 4,697,782 regions plausibly coding for trRNAs; 13,309,757 Universe regions), multiple ROIs (MS GWAS, EBNA2, EBNA3C, VDR, AID, etc.), databases of vast background regions as they were populated with the data obtained from GWAS Catalog, ENCODE, and other raw data sources, (3) provide overlaps and intersection among various data elements, annotate them with the original MS GWAS loci that generated the signal, and (4) generate the overarching data resource available at www.mscoloc.com.

ABC functional mapping

The Activity-By-Contact model was applied to map genes regulated by selected MS-trRNA colocalization hotspots. Briefly, this model identifies gene-enhancers connection taking into account chromatin accessibility (ATAC-seq and DNase-seq experiments), histone modifications (H3K27ac ChIP–seq), and chromatin conformation (Hi-C)⁵⁰. ABC analysis was performed using the ABC pipeline outputs for 131 cell types and tissues⁵¹. Gene-enhancers maps were produced through https://flekschas.github.io/enhancer-gene-vis/. Pathway and process enrichment analysis of mapped genes with the highest ABC score for each coloc region was performed through Metascape⁵², using the entire human genome as background and the following ontology sources were used: GO Biological Processes, KEGG Pathway, Reactome Gene Sets, Hallmark Gene Sets, Canonical Pathways, BioCarta Gene Sets and WikiPathways.

Statistical analysis

For SNP overlaps and region colocalization, we used LOLA⁵³ and Fisher’s exact test with False Discovery Rate (Benjamini-Hochberg) to control for multiple testing. Linkage disequilibrium was considered as described in Sheffield & Bock, 2016. Resulting -log (p-value), support, and Odds Ratio (OR) were combined into a single score inspired by the harmonic mean⁵⁴ and multi-objective optimization⁵⁵ with the formula below, where the spacing parameter k_p was set to 10.0 and we consider all three contributors equally, setting therefore weights w_i to 1.0. Statistical significance was taken at p < 0.05.

$$ {Harmonic}_{{{{Score}}}} = k_{p} *\frac{{\mathop \sum \nolimits_{i}^{ } w_{i} }}{{\frac{{w_{1} }}{ - logP} + \frac{{w_{2} }}{Supp} + \frac{{w_{3} }}{OR}}} $$

For pathway analysis in Metascape, enrichment p-values were calculated based through the accumulative hypergeometric distribution, q-values were calculated using the Benjamini–Hochberg method to account for multiple testing.

Data availability

The dataset supporting the conclusions of this article is available at the website www.mscoloc.com.

References

Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473(7345), 43–49 (2021).
Article ADS CAS Google Scholar
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099), 1190–1195 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95(5), 535–552 (2014).
Article CAS PubMed PubMed Central Google Scholar
Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518(7539), 337–343 (2015).
Article ADS CAS PubMed Google Scholar
Vahedi, G. et al. Super-enhancers delineate disease-associated regulatory nodes in t cells. Nature 520(7548), 558–562 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
chris.cotsapas@yale.edu IMSGCEa, Consortium IMSG. Low-frequency and rare-coding variation contributes to multiple sclerosis risk. Cell 180(2):403 (2020).
Mumbach, M. R. et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 49(11), 1602–1612 (2017).
Article CAS PubMed PubMed Central Google Scholar
van Arensbergen, J. et al. High-throughput identification of human SNPS affecting regulatory element activity. Nat. Genet. 51(7), 1160–1169 (2019).
Article PubMed PubMed Central CAS Google Scholar
Calderon, D. et al. Landscape of stimulation-responsive chromatin across diverse human immune cells. Nat. Genet. 51(10), 1494–1505 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ohkura, N. et al. Regulatory t cell-specific epigenomic region variants are a key determinant of susceptibility to common autoimmune diseases. Immunity 52(6), 1119-1132.e1114 (2020).
Article CAS PubMed Google Scholar
Meuleman, W. et al. Index and biological spectrum of human dnase I hypersensitive sites. Nature 584(7820), 244–251 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Consortium G. The gtex consortium atlas of genetic regulatory effects across human tissues. Science 369(6509), 1318–1330 (2020).
Article CAS Google Scholar
Ristori, G. et al. Multiple sclerosis in twins from continental italy and sardinia: A nationwide study. Ann. Neurol. 59(1), 27–34 (2006).
Article PubMed Google Scholar
Fagnani, C. et al. Twin studies in multiple sclerosis: A meta-estimation of heritability and environmentality. Mult. Scler. 21(11), 1404–1413 (2015).
Article CAS PubMed Google Scholar
Bordi, I. et al. A mechanistic, stochastic model helps understand multiple sclerosis course and pathogenesis. Int. J. Genom. 2013, 910321 (2013).
Google Scholar
Bordi, I. et al. Noise in multiple sclerosis: Unwanted and necessary. Ann. Clin. Transl. Neurol. 1(7), 502–511 (2014).
Article PubMed PubMed Central Google Scholar
Ricigliano, V. A. et al. Ebna2 binds to genomic intervals associated with multiple sclerosis and overlaps with vitamin d receptor occupancy. PLoS One 10(4), e0119605 (2015).
Article PubMed PubMed Central CAS Google Scholar
Mechelli, R. et al. Epstein-barr virus genetic variants are associated with multiple sclerosis. Neurology 84(13), 1362–1368 (2015).
Article CAS PubMed PubMed Central Google Scholar
Harley, J. B. et al. Transcription factors operate across disease loci, with ebna2 implicated in autoimmunity. Nat. Genet. 50(5), 699–707 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ascherio, A. et al. Epstein-barr virus antibodies and risk of multiple sclerosis: A prospective study. JAMA 286(24), 3083–3088 (2001).
Article CAS PubMed Google Scholar
Simon, K. C., Munger, K. L. & Ascherio, A. Vitamin d and multiple sclerosis: Epidemiology, immunology, and genetics. Curr. Opin. Neurol. 25(3), 246–251 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bjornevik, K. et al. Longitudinal analysis reveals high prevalence of epstein-barr virus associated with multiple sclerosis. Science 375(6578), 296–301 (2022).
Article ADS CAS PubMed Google Scholar
Lanz, T. V. et al. Clonally expanded b cells in multiple sclerosis bind ebv ebna1 and glialcam. Nature 603(7900), 321–327 (2022).
Article ADS CAS PubMed Google Scholar
Schwalb, B. et al. Tt-seq maps the human transient transcriptome. Science 352(6290), 1225–1228 (2016).
Article ADS CAS PubMed Google Scholar
Michel, M. et al. Tt-seq captures enhancer landscapes immediately after t-cell stimulation. Mol. Syst. Biol. 13(3), 920 (2017).
Article PubMed PubMed Central CAS Google Scholar
Villamil, G., Wachutka, L., Cramer, P., Gagneur, J. & Schwalb, B. Transient transcriptome sequencing: Computational pipeline to quantify genome-wide rna kinetic parameters and transcriptional enhancer activity. bioRxiv 659912 (2019).
Natoli, G. & Andrau, J. C. Noncoding transcription at enhancers: General principles and functional models. Annu. Rev. Genet. 46, 1–19 (2012).
Article CAS PubMed Google Scholar
Bose, D. A. et al. RNA binding to CBP stimulates histone acetylation and transcription. Cell 168(1–2), 135-149.e122 (2017).
Article CAS PubMed PubMed Central Google Scholar
Weinert, B. T. et al. Time-resolved analysis reveals rapid dynamics and broad scope of the CBP/p300 acetylome. Cell 174(1), 231-244.e212 (2018).
Article CAS PubMed PubMed Central Google Scholar
Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature 565(7738), 251–254 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
O'Donoghue, G.P., Bugaj, L.J., Anderson, W., Daniels, K.G., Rawlings, D.J., Lim, W.A. T cells selectively filter oscillatory signals on the minutes timescale. Proc Natl Acad Sci U S A 118(9) (2021).
Buniello, A. et al. The nhgri-ebi gwas catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucl. Acids Res. 47(D1), D1005–D1012 (2019).
Article CAS PubMed Google Scholar
Nott, A. et al. Brain cell type-specific enhancer-promoter interactome maps and disease. Science 366(6469), 1134–1139 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Consortium IMSG. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 365(6460) (2019).
Marcucci, S. B. & Obeidat, A. Z. Ebna1, ebna2, and ebna3 link epstein-barr virus and hypovitaminosis d in multiple sclerosis pathogenesis. J. Neuroimmunol. 339, 577116 (2020).
Article CAS PubMed Google Scholar
Bäcker-Koduah, P. et al. Vitamin d and disease severity in multiple sclerosis-baseline data from the randomized controlled trial (evidims). Front. Neurol. 11, 129 (2020).
Article PubMed PubMed Central Google Scholar
Sun, Y. et al. Critical role of activation induced cytidine deaminase in experimental autoimmune encephalomyelitis. Autoimmunity 46(2), 157–167 (2013).
Article CAS PubMed Google Scholar
Mechelli, R. et al. Viruses and neuroinflammation in multiple sclerosis. Neuroimmunol. Neuroinflammation 8, 269–83 (2021).
CAS Google Scholar
Park, A. et al. Global epigenomic analysis of kshv-infected primary effusion lymphoma identifies functional. Proc. Natl. Acad. Sci. U S A 117(35), 21618–21627 (2020).
Article CAS PubMed PubMed Central Google Scholar
Meng, F. L. et al. Convergent transcription at intragenic super-enhancers targets aid-initiated genomic instability. Cell 159(7), 1538–1548 (2014).
Article CAS PubMed PubMed Central Google Scholar
Qian, J. et al. B cell super-enhancers and regulatory clusters recruit aid tumorigenic activity. Cell 159(7), 1524–1537 (2014).
Article CAS PubMed PubMed Central Google Scholar
Orrù, V. et al. Complex genetic signatures in immune cells underlie autoimmunity and inform therapy. Nat. Genet. 52(10), 1036–1045 (2020).
Article PubMed PubMed Central CAS Google Scholar
Factor, D. C. et al. Cell type-specific intralocus interactions reveal oligodendrocyte mechanisms in MS. Cell 181(2), 382-395.e321 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fitz, J. et al. Spt5-mediated enhancer transcription directly couples enhancer activation with physical promoter interaction. Nat. Genet. 52(5), 505–515 (2020).
Article CAS PubMed Google Scholar
Gutierrez-Arcelus, M. et al. Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci. Nat. Genet. 52(3), 247–253 (2020).
Article CAS PubMed PubMed Central Google Scholar
Simeonov, D. R. et al. Discovery of stimulation-responsive immune enhancers with CRISPR activation. Nature 549(7670), 111–115 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Halfon, M. S. Silencers, enhancers, and the multifunctional regulatory genome. Trends Genet. 36(3), 149–151 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wong, E. S. et al. Deep conservation of the enhancer regulatory code in animals. Science 370(6517), eaax8137 (2020).
Article ADS CAS PubMed Google Scholar
Sloan, C. A. et al. Encode data at the encode portal. Nucl. Acids Res. 44(D1), D726-732 (2016).
Article CAS PubMed Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51(12), 1664–1669 (2019).
Article CAS PubMed PubMed Central Google Scholar
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593(7858), 238–243 (2021).
Article ADS CAS PubMed Google Scholar
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10(1), 1523 (2019).
Article ADS PubMed PubMed Central CAS Google Scholar
Sheffield, N. C. & Bock, C. Lola: Enrichment analysis for genomic region sets and regulatory elements in R and bioconductor. Bioinformatics 32(4), 587–589 (2016).
Article CAS PubMed Google Scholar
Wilson, D. J. The harmonic mean. Proc. Natl. Acad. Sci. U S A 116(4), 1195–1200 (2019).
Article CAS PubMed PubMed Central MATH Google Scholar
Umeton, R., Sorathiya, A., Liò, P., Papini, A., Nicosia, G. Design of robust metabolic pathways. In Proceedings of the 48th Design Automation Conference (DAC '11). ACM, New York, NY, USA, 747–752 (2011).

Download references

Acknowledgements

Authors would like to thank Dr. Adem Albayrak for the editorial suggestions that improved this manuscript.

Funding

This work was supported by “Progetti Grandi Ateneo” 2020, Sapienza University of Rome. MS and GR are supported by CENTERS, a Special Project of, and financed by, FISM—Fondazione Italiana Sclerosi Multipla. RPU is supported by the National MS Society.

Author information

These authors contributed equally: Renato Umeton and Gianmarco Bellucci.

Authors and Affiliations

Department of Informatics and Analytics, Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton
Department of Biological Engineering, Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Renato Umeton
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Renato Umeton
Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
Renato Umeton
Department of Neurosciences, Mental Health and Sensory Organs, Centre for Experimental Neurological Therapies (CENTERS), Sapienza University of Rome, Rome, Italy
Gianmarco Bellucci, Rachele Bigi, Silvia Romano, Maria Chiara Buscarinu, Roberta Reniè, Virginia Rinaldi, Emanuele Morena, Carmela Romano, Marco Salvetti & Giovanni Ristori
Neuroimmunology Unit, IRCCS Fondazione Santa Lucia, Rome, Italy
Rachele Bigi, Maria Chiara Buscarinu & Giovanni Ristori
Department of Neurology, UMass Memorial Health Care, Worcester, MA, USA
Raffaella Pizzolato Umeton
University of Massachusetts Medical School, Worcester, MA, USA
Raffaella Pizzolato Umeton
Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
Raffaella Pizzolato Umeton
Harvard Medical School, Boston, MA, USA
Raffaella Pizzolato Umeton
IRCCS San Raffaele Pisana, Rome, Italy
Rosella Mechelli
San Raffaele Roma Open University, Rome, Italy
Rosella Mechelli
IRCCS Istituto Neurologico Mediterraneo Neuromed, Pozzilli, Italy
Marco Salvetti

Authors

Renato Umeton
View author publications
You can also search for this author in PubMed Google Scholar
Gianmarco Bellucci
View author publications
You can also search for this author in PubMed Google Scholar
Rachele Bigi
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Romano
View author publications
You can also search for this author in PubMed Google Scholar
Maria Chiara Buscarinu
View author publications
You can also search for this author in PubMed Google Scholar
Roberta Reniè
View author publications
You can also search for this author in PubMed Google Scholar
Virginia Rinaldi
View author publications
You can also search for this author in PubMed Google Scholar
Raffaella Pizzolato Umeton
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Morena
View author publications
You can also search for this author in PubMed Google Scholar
Carmela Romano
View author publications
You can also search for this author in PubMed Google Scholar
Rosella Mechelli
View author publications
You can also search for this author in PubMed Google Scholar
Marco Salvetti
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Ristori
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.U., G.B., R.B., R.M., M.S., and G.R. conceived and planned the analysis. R.R., V.R., E.M., C.R., S.R., and M.C.B. guided data engineering and database generation from raw data. R.U., R.P.U. and G.B. developed the data resource and analyzed the data. R.U., G.B., R.P.U., GR and M.S. wrote the manuscript. R.U, G.B, R.B., and R.M. created all tables and figures. R.B., R.M., M.S. and G.R. supervised the project. All the authors, including S.R., M.C.B., R.R., V.R., E.M. and C.R., contributed to fortnight discussion for data interpretation and new analysis planning. All the authors reviewed and approved the manuscript.

Corresponding authors

Correspondence to Renato Umeton, Marco Salvetti or Giovanni Ristori.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Umeton, R., Bellucci, G., Bigi, R. et al. Multiple sclerosis genetic and non-genetic factors interact through the transient transcriptome. Sci Rep 12, 7536 (2022). https://doi.org/10.1038/s41598-022-11444-w

Download citation

Received: 14 October 2021
Accepted: 22 April 2022
Published: 09 May 2022
DOI: https://doi.org/10.1038/s41598-022-11444-w

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.