Single-cell transcriptomics identifies drivers of local inflammation in multiple sclerosis

Single-cell transcriptomics enables unbiased biological discovery and holds new promise for personalized medicine. However, its potential for understanding human diseases by comparing patient vs. control samples in a clinical setting remains largely unexplored. Here, we applied single-cell RNA-sequencing (scRNA-seq) to rare cerebrospinal fluid (CSF) specimens from well-characterized controls and patients with multiple sclerosis (MS) – a prototypic inflammatory disease of the central nervous system (CNS). We thereby generated and validated the first transcriptional atlas of single CSF leukocytes in health and disease. In MS patients, we found an expansion of natural killer cells and late B cell lineages and based on these insights we developed a score with potential diagnostic relevance. Using this analytical approach, we identified and characterized activated phenotypes of MS-derived CSF leukocytes, including an enrichment in T follicular helper (TFH) cell transcriptional signatures. We validated the expansion of such B cell-helping TFH cells in MS patients and demonstrated that TFH cells exacerbate symptoms in an animal model of MS and promote B cell infiltration of the CNS. TFH-dependent B cell expansion may thus drive local CNS autoimmunity in MS. Our study demonstrates how single-cell transcriptomics can identify novel disease mechanisms in a clinically-relevant case-control study design. One Sentence Summary Unbiased single-cell transcriptomics re-defines the transcriptional landscape of cerebrospinal fluid leukocytes and identifies T follicular helper cells as essential drivers of local inflammation in multiple sclerosis.


Introduction 66 67
Single-cell transcriptomics is a transformative and rapidly evolving technology generating biological 68 information at unprecedented resolution and scale. The technique has mostly been employed to re-69 define the heterogeneity of complex tissues derived from healthy rodents or humans (1,2). The novelty 70 of these studies has mostly been limited to the identification of previously unrecognized cell types or 71 cell phenotypes (3) and the regulation of their development. Diseased tissues have also been analyzed 72 with single-cell technologies and the cancer field has seen especially rapid adaptation of these methods 73 Next, we analysed our dataset for disease-specific differences in CSF cell composition ( Fig. 2A). 156 Overall, inter-donor variability was high (Fig. S2B). Despite this variability, there were significant 157 compositional differences between the MS and control cohorts (Methods). Binomial regression 158 modelling of scRNA-seq cluster membership counts reflects a significant decrease in the proportion of 159 non-classical monocytes relative to classical monocytes in MS (Wald test P < 10 -8 ; Fig. 2B). A 160 decreased ratio of non-classical / classical monocytes was confirmed by flow cytometry (t-test P < 0.01; 161 were detected in samples of all 4 MS patients but were virtually absent from control-derived CSF 169 samples ( Fig. 2A and Fig. S2B). 170 The expansion of B lineage cells was a uniquely MS-specific feature (Fig. S4C) and we therefore 171 examined these clusters in greater detail. In the B cell cluster, IGHD (marker of naïve B cells) and 172 IGHM genes were dominantly expressed in 5% and 34% of B cells, respectively (Fig. S4A,D). The 173 expression of heavy chain genes was dominated by IGHG genes in the plasma cell cluster (83%; Fig.  174 S4B,D) while fewer cells expressed IGHA genes (encoding IgA chains). This verifies that the vast 175 majority of plasma cells in the CSF are class-switched. In both B cells and plasma cells the ratio between transcripts. Abundance of such proliferating T helper cells was increased in MS-derived samples (Fig. 205 2G) potentially indicating local expansion of CD4 + T cells in the CSF in MS. Three of the remaining 206 clusters exhibited a memory-like phenotype (SELL int/lo CCR7 int/lo ) and were transcriptionally best 207 described as central memory (up-regulation of CD69, FDR < 0.05; cm_CD4: CD69 hi CD44 hi and 208 CD27 hi ), as early effector memory (up-regulation of IL7R and CD69, FDR < 0.05; eem_CD4: 209 CD69 hi CD44 hi and CD28 hi ), and as late effector memory (up-regulation of IL7R, FDR < 0.05; 210 lem_CD4: CD69 int CD44 hi and CD28 lo ) CD4 + T cells. These clusters showed no significant disease-211 specific expansion or contraction, after accounting for donor variability. Flow cytometry detected no 212 significant differences in the proportion of total CD4 + vs. CD8 + T cells in MS (Fig. 2H) indicating that 213 changes to T cells in MS are subtle, occurring at the subset level. 214 While the division of the CD4 + T cells into sub-clusters was informative in this context, we found that 215 the resulting clusters are not very well distinguished from one another (Table S4) and that CD4 + T cells 216 transcriptionally instead form a continuum of cell states, in accordance with previous scRNA-seq 217 studies (30, 36). Indeed, independent of our clustering analysis, we explored how transcriptional 218 signatures vary across the entire CD4 + T cell population, using VISION (an updated R version of 219 FastProject (37) https://github.com/YosefLab/VISION). This analysis highlighted a continuum of 220 transcriptional CD4 + T cell states that span multiple sub-clusters in terms of T cell activation and 221 memory (Fig. S6A,B), thereby providing a view of the data complementary to our analysis above. 222 Analysis of MS-related transcriptional changes of CD4 + T cells may therefore benefit from techniques 223 that do not depend on data-driven partitions, e.g., clusters. These insights motivated our development 224 of CSEA below (see also Fig. S7,8 expansion, but offered limited additional insight because unsorted cells were profiled. We therefore 232 MHC class I genes (i.e. HLA-A, HLA-C, B2M) and of IL32 in MS patients indicating increased activation 261 (23). In accordance, GSEA (41) showed enrichment of pathways associated with protein synthesis (e.g. 262 peptide chain elongation; P < 0.01) and thus cellular activation in CD4 + T cells and naïve CD8 + in MS 263 (Table S6). The CD8 + T cell clusters showed higher expression of genes associated with activation and 264 cytotoxicity (GZMK, GZMA, PRF1 encoding perforin 1) and GSEA identified antigen presentation 265 pathways in activated CD8 + T cells (P < 0.01, Table S6). Overall, this suggests higher activation and 266 cytolytic capacity of cytotoxic CSF cells in MS. Both classical and non-classical monocytes featured higher 267 expression of genes associated with antigen presentation (e.g. CD74, HLA-DRB1) and with migration (e.g. 268 ITGB2 encoding integrin-β2). The non-classical monocyte cluster also showed signs of increased secretory 269 activity (e.g. induction of GRN encoding granulin) and GSEA found antigen presentation and interferon 270 signaling pathways enriched in this cluster (P < 0.01, Table S6). The mDC cluster showed an increased 271 expression of MHC class II (i.e. HLA-DRA) and MHC class I genes (e.g. HLA-A) and induction of CD1E. 272 This indicates a propensity for lipid antigen presentation. GSEA identified lymphocyte costimulation 273 pathways in this cluster (P < 0.01, Table S6). We did not observe statistically significant disease-specific 274 transcriptional changes in the NK, pDC, and Bc clusters. This may -at least in part -be due to low cell 275 numbers in these clusters. Because plasma cells were virtually undetected in control patients estimation of 276 differential expression effect was prohibited. In conclusion, our analysis of transcriptional changes 277 individually in each cell cluster reflects an ongoing immune cell activation in the CSF in MS. 278

279
Cell set enrichment analysis helps identifying disease-specific transcriptional changes 280 281 Our approach above used conventional single-cell analysis steps: 1) identifying cell clusters, 2) 282 obtaining differentially expressed (DE) genes between disease-states for every cluster, 3) using GSEA 283 to test for over-representation of known gene-sets and ascribe biological meaning. We speculated that 284 this approach would be particularly insensitive to gene signatures or cell states that are poorly 285 represented by tight clustering -as observed in CD4 + T cell subsets. For example, a certain functional 286 property may be specific to MS but only be present in a small subset of cells within a cluster; these 287 patterns could be easily missed in a cluster-wide MS vs. control comparison. We therefore developed a 288 novel procedure -cell set enrichment analysis (CSEA) -which reutilizes the GSEA test for working 289 on ranked lists of cells rather than genes (Methods, Fig. S7). In this procedure, cells in a cluster are first 290 ordered by a transcriptional phenotype of interest (e.g., summed expression of genes in a pathway). The 291 statistical test can then detect cases in which a subset of cells from one group (e.g., MS) exhibit 292 unusually high or low values of that transcriptional phenotype compared to cells from the second group 293 (e.g., control). We refer to these exceptional groups of cells as core cell sets. 294 We used this technique for a more comprehensive and clustering-free exploration of disease-specific 295 transcriptional changes in the CD4 + T cell compartment. As a source for transcriptional phenotype, we 296 used signature scores from the VISION pipeline (Fig. S7). VISION signature scores are calculated by 297 summing the expression of specific sets of genes, which can reflect a dichotomy between conditions of 298 interest (e.g., naïve vs. memory T cell state) or a certain cellular function (e.g., signaling through 299 interleukin (IL)-2; see Methods). The gene signatures were obtained from databases such as MSigDB 300 and NetPath (42, 43) and are based on literature curation and on mining of large numbers of published 301 microarray and RNA-seq studies (Methods). 302 Our CSEA testing procedure returned lists of core cell sets driving statistically significant signature 303 enrichments in MS (P < 0.01, Bonferroni adjusted, Table S7). We identified core MS cell sets (Methods) 304 driving enrichments for both an exhausted-versus-naïve CD4 signature (44) and a memory T cell 305 signature (45) (Fig. S8), both exhibiting considerable overlap with the memory sub-clusters in Figure  306 2D. Importantly, the memory cell clusters exhibited no significant MS-specific differential abundance 307 in our standard analysis above, but CSEA highlights a subset of these cells with pronounced memory 308 or exhausted phenotypes that are particularly abundant in MS CSF. This argues for persistent T cell 309 activation in the CSF in MS. We further identified several other MS core cell sets with exceptionally 310 high expression of transcriptional signatures of T helper cell (Th)1 (46), induced (i)Treg (47), and T 311 follicular helper (TFH) cells (48,49). Importantly, the cells in each of these three core sets do not 312 significantly cluster in a transcriptome-wide analysis (VISION consistency testing P-value > 0.1), 313 suggesting that cluster-based analyses are not well suited for capturing this layer of cell phenotype; e.g., 314 cells expressing a Th1-polarized transcriptome are spread across both naive and memory clusters. Our 315 novel analytical approach can therefore decouple clustering of cells from disease-state enrichment of 316 cells, providing a new framework for interpreting complex scRNA-seq datasets. 317 Overall, these CSEA results emphasize an expansion of CD4 + T cells with a Treg, Th1, and TFH 318 phenotype in MS. The Th1 result could indicate a greater role for Th1 versus Th17 in MS disease in the 319 CSF. Interestingly, TFH cells are known to drive B cell maturation. This lead us to hypothesize that an 320 increase in TFH abundance is responsible for the differences we observed in the B cell compartment of 321 the CSF.  Table S4). The proportion of activated TFH cells expressing PD-1 and ICOS was also increased in MS 332  (Table S8). This indicates that numerical differences in TFH cell abundance 340 are more pronounced than transcriptional changes of TFH cell phenotype. To investigate this further, 341 we performed GSEA and found an enrichment of gene sets associated with T helper cell memory and 342 pathogenicity in MS-derived TFH cells (P < 0.01, Bonferroni correction; Table S9). Genes often recurring in these enriched gene sets (Fig. S9) were associated with cytotoxicity and cell death (e.g. 344 GZMA, GZMK, CASP3, CASP4) and with co-inhibitory function (e.g. KLRG1, TIGIT, CTLA4). In 345 accordance with our CSEA results, this suggests that pathogenic TFH cells expand in the CSF in MS 346 patients. TFH cells are essential for the maturation of plasma cells and memory B cells. TFH expansion 347 may thus contribute to the local interaction between T and B cells and thus potentially drive the disease. we extracted leukocytes infiltrating the CNS at the peak of EAE we found that the proportion of pro-361 inflammatory IL-17 producing CD4 + T cells was reduced in CD4 Cre Bcl6 fl/fl mice indicating a lower 362 degree of CNS tissue destruction and inflammation in the absence of TFH cells (Fig. 5D). Next, we 363 tested how the absence of TFH cells influenced B cells in the CNS and found a lower proportion of 364 total B cells (B220 + CD3 -) infiltrating the CNS in CD4 Cre Bcl6 fl/fl mice by flow cytometry (Fig. 5E). We  In this study we applied single-cell transcriptomics to rare and clinically relevant CSF specimen from 373 MS patients and controls. We thereby create the first comprehensive map of the cellular composition 374 and transcriptional phenotype of CSF cells. In analysing our data, we observed that -transcriptionally 375 -CD4 + T cells are best described as a continuum of cell states rather than clearly defined subsets or 376 clusters (36). This observation together with considerable inter-donor heterogeneity necessitated 377 development of CSEA that facilitates extracting disease-specific mechanisms from complex scRNA-378 seq data, by focusing (in a data-driven way) on the most relevant subsets of cells. Beyond the specific 379 application in this paper, we therefore expect that methods such as CSEA will be essential for 380 conducting future single-cell transcriptomics studies with a case vs. control design.

functional link between TFH cells and B cells in neuro-inflammation was not previously 399
reported. Notably, the gene encoding the TFH marker CXCR5 is a genetic risk locus for MS (59). 400

Previous studies suggest that TFH cells and B cells in the CSF could be derived from meningeal sources. 401
Chronic ongoing CNS inflammation induces ectopic lymphoid tissue (eLT) in the affected tissue in 402 many autoimmune diseases and is thought to be the site of local auto-antibody production (60). In MS,

Patient recruiting and inclusion 443
A total of 26 treatment-naive patients with MS or clinically isolated syndrome (CIS) receiving a lumbar 444 puncture (LP) for diagnostic purposes, were prospectively recruited (Table S1). The control group 445 consisted of 22 patients diagnosed with idiopathic intracranial hypertension (IIH) ( Table S1). Patients 446 were recruited in three consecutive cohorts. CSF cells from cohort 1 were used for unsorted single-cell 447 RNA-seq (6 IIH vs. 6 MS patients). CSF cells from cohort 2 were analysed by flow cytometry only (7 448 IIH vs. 11 MS patients), and cells from cohort 3 were flow sorted for RNA-seq of CD3 + CD4 + CXCR5 + 449 TFH cells (9 IIH vs. 9 MS patients) (Table S1 and  Exclusion criteria for all patients were: 1) immunologically relevant co-morbidities (e.g. rheumatologic 459 diseases), 2) severe concomitant infectious diseases (e.g. HIV, meningitis, encephalitis), 3) pregnancy 460 or breastfeeding, 4) younger than 18 years, 5) mental illness impairing the ability to give informed 461 consent, 6) artificial blood contamination during the lumbar puncture resulting in >200 red blood cells 462 / μl. patients whose diagnostic work-up revealed a diagnosis other than MS / IIH within four weeks of 463 clinical follow-up were retrospectively excluded (Fig. S1C). The following diagnostic tests were 464 performed in all MS patients to exclude differential diagnoses: PCR for cytomegaly virus, Ebstein-Barr The maximum of CSF cells used for input was 10,000 cells. If total available CSF cell numbers were 486 lower than 10,000 cells, all available cells were processed. On average 5,917 cells ± 1,505 SD (control 487 6,167 cells ± 2,614 SD vs. MS 5,667 cells ± 1,506 SD) CSF cells were used as input per donor. Single-cell suspensions were loaded onto the Chromium Single Cell Controller using the Chromium 501 Single Cell 3' Library & Gel Bead Kit v2 (both from 10X Genomics) chemistry following the 502 manufacturer's instructions. Sample processing and library preparation was performed according to 503 manufacturer instructions using AMPure beads (Beckman Coulter). Sequencing was carried out on a 504 local Illumina Nextseq 500 using the High-Out 75 cycle kit with a 26-8-0-57 read setup. Average 505 sequencing depth was 51,064 ± 13,041 SEM reads/cell (Table S3). 506

Single-Cell Sample Filtering 522
Initial exploratory data analysis identified one MS sample and one IIH sample whose clustering did not 523 overlap with other samples (data not shown). This tight clustering suggested either strong batch effects 524 or significant contamination. Both samples were excluded from further analysis, leaving 5 control-and 525

MS-derived samples. 526
Nine barcode-level quality control (QC) metrics were computed for the unfiltered 10x Cell Ranger 527 output: (1) number of unique molecular identifiers (UMIs), (2) number of reads, (3) mean reads per 528 UMI, (4) standard deviation of reads per UMI, (5) percent of reads confidently mapped to the gene, (6) 529 percent of reads mapped to the genome but not a gene, (7) percent of reads unmapped, (8) percent of 530 UMIs corrected by the Cell Ranger pipeline, and (9) the number of cell barcodes corrected by the Cell 531 Ranger pipeline. These metrics were used for filtering and normalization. We applied the gene and 532 sample filtering using a scheme previously described (81). This involved four steps:  After sample filtering, we loaded the normalized log-transformed UMI matrix into the Seurat analysis 585 pipeline (84). Following data scaling and PCA, we clustered the cells in the first ten principal 586 components using the Seurat::FindClusters function. Clustering resolution was set to 0.6. Identical 587 options were used for the CD4 + -only Seurat analysis (see below), defining subclusters of those cells. 588 We manually annotated clusters based on marker gene expression and enrichment analyses described  Binomial regression modelling was applied to compare the relative sampling of classical and non-620 classical monocytes in monocyte fraction of MS and control CSF samples. The classical fraction of 621 monocytes increased significantly from 17% in control donors to 32% in MS donors (Wald test P < 10 -622 8 ). 623

Cluster-specific expression analysis 624
We performed one versus all comparisons following each clustering analysis in order to annotate the 625 clusters. One versus all differential expression (DE) tests P-values were used to rank genes by the extent 626 they are up-regulated in one cluster over all others. Tests were performed separately for each donor 627 sample with at least 10 cells in the target cluster. qPC factors used for normalization above were 628 incorporated into a linear predictor for limma-voom DE testing. Results for each donor sample were 629 combined in multiple ways, calculating median log fold changes, meta-analysis P-values for one-sided 630 tests using Stouffer's method, and irreproducible discovery rates (IDR) (86) for two-sided tests using 631 the est.IDRm tool in the scRAD package (87) for all genes and comparisons ( After re-clustering the CD4 + T cell cluster (CD4_Tc), the marker criteria above identified contaminating 637 populations with markers of non CD4 + T cell lineages. These cells were erroneously clustered together 638 with CD4 + T cells in the initial clustering and we named them remaining CD8 + T cells (r-CD8) and 639 remaining monocytes (r-mono). 640 641

642
Remaining CD4 + subclusters were annotated by joint considerations of i) significant (FDR < 0.05) and 643 large log2 fold change greater than 0.1 and ii) Mean expression of known markers (Fig. 2F). 644

Per cluster case-control comparison 645
Cluster-specific gene expression differences between MS and control were also assessed. Donors were 646 only included in a comparison if 10 or more cells from the target cluster were detected in the donor's 647 sample. All pairings of MS donors with control donors were considered (up to 16). For each valid case-648 control pair, DE analysis was performed using limma-voom, as in the marker analysis, but comparing 649 case cells against control cells. Log fold change was summarized by the median of log fold changes 650 estimated across the donor pairs. Meta-analysis was performed on all possible pairings of cases and 651 controls (up to 4! = 24); the median meta-analysis P-value was reported. IDR modelling was applied at 652 the pair level, modelling the reproducibility of up to 16 replicate significance signals (Table S4). Some 653 genes are very lowly expressed across individual clusters, resulting in unstable statistical estimation for 654 those genes. Genes were filtered before DE if they had mean un-normalized UMI counts below 0.05. 655 656

Gene Set Enrichment Analysis (GSEA) 657
After deriving lists of differentially expressed genes, we sought to uncover enrichment for particular 658 gene sets to capture biological differences between samples. We applied GSEA tests (41) to all single-659 cell differential expression tests returning cluster specific gene expression (i.e. genes expressed by one 660 cluster vs. other clusters, Table S5) and disease specific gene expression (i.e. genes expressed within 661 one cluster in MS cells vs. control cells, Table S6). We used signed significance scores based on meta-662 analysis P-values as gene signals and applied the Bonferroni adjustment to control the FWER for each 663 category of hypotheses (i.e. test type, cluster, and sign). Sets considered in this analysis include all 664 MSigDB C7 signature sets and all curated T cell signature sets described previously (36) with 10 or 665 more genes quantified in the present study; "UP" and "DN" signature subsets were tested separately. 666 The initial description of GSEA recommend simulating a null distribution for the GSEA test statistic at 667 the gene level (e.g. recomputing lfcs for shuffled sample labels) (41). This approach is computationally 668 costly in our case; in this analysis we generated null distributions of the GSEA test statistic by shuffling 669 gene set memberships, assigning empirical one-sided P-values based on simulation (https://CRAN.R-670 project.org/package=gsEasy). 671 672

Cell Set Enrichment Analysis (CSEA) 673
For the CD4 + -only analysis we considered a novel adaptation of the GSEA method, applying the 674 technique to cell sets: CSEA (illustrated in Fig. S7). CSEA is a hypothesis testing method for simultaneously uncovering enrichments and identifying subsets of cell sets of importance. In this 676 procedure, a collection of cells is first ordered by a transcriptional phenotype of interest (e.g., sum 677 expression of genes in a pathway). The resulting statistical test is sensitive to cases in which only a 678 subset of cells from one group (e.g., MS) exhibit unusually high or low values of the transcriptional 679 phenotype. The input to this method is a list of N cells, rank-ordered by some input signal. Our analysis indicating that some cells have higher expression of genes characterizing the naïve state and lower 687 expression of genes characterizing the memory state. Using the notation previously described (41) we 688 will use r j to denote the cell j's signature score; indices have been sorted so that rj > rj+1 (alternatively in 689 increasing order: rj < rj+1). The test involves considering all cells up to a specific position, i. A "hit" 690 score is defined as the cumulative sum of signature score magnitudes (optionally exponentiated by 691 parameter p: |rj| p ) for members of cell set S, divided by the sum over all set members in the list. A "miss" 692 score is similarly calculated for non-members of S, but without weighing by signature score magnitudes. 693 The CSEA enrichment score (ES) is defined as the maximum of the difference between the running hit 694 score and running miss score with respect to index i. When p=0, the ES reduces to a one-sided KS test 695 statistic for differential signature analysis between cell sets. We apply the same permutation scheme as 696 described for GSEA above. For p>0, CSEA cannot be seen as a simple differential signature test: CSEA 697 tests for enrichment of a cell set at the high tail (or low tail) of the signature score distribution, but 698 additionally weighs the set elements according to their signature value. This reduces the effects of low- goal of this approach is to identify core sets of cells that drive each biological condition's enrichment 708 for high or low signature values (Fig. S7). Contaminating sub-populations in the CD4_Tc cluster, were 709 removed prior to CSEA.  We performed PCA on the scaled log-transformed normalized data for visualization. DE between MMS 756 and IIH donors was performed with limma-voom, using RUVg factors and batch in the model to adjust 757 for unwanted variation. Per-gene DE significance scores were computed from log-transformed P-values 758 and used for GSEA enrichment testing. Sets considered for testing included numbers 3,5, and 6 described in the VISION section. The 42 most frequent core members of the significant enrichments 760 (Bonferroni adjusted P-value less than 0.01) -genes driving 7 or more of these enrichments -were 761 selected and their normalized log values were correlated against each-other and represented in a sorted 762 heatmap using pheatmap defaults. 763 764 Expression deconvolution using scRNA-seq data 765 Raw UMI mean counts per cluster were used as input for deconvolution. Cibersort was used for RNA 766 expression deconvolution (38) on the E-MTAB69 dataset described previously (19). We found that 767 when using highly similar cell clusters as input for deconvolution (e.g., CD4_Tc together with CD4 + T 768 cell sub-clusters) lower abundance clusters (e.g., CD4 + T cell sub-clusters) were not identified due to 769 high transcriptional overlap. We therefore excluded the CD4_Tc cluster from deconvolution. A grades for clinical signs of EAE using the following scoring system: 0, healthy; 1, paralyzed tail tip; 2, 789 paralyzed tail; 3, waddling; 4, hind legs drag on the ground; 5, butt on the ground; 6, one paralyzed hind 790 leg; 7, both paralyzed hind legs; 8, one paralyzed front leg (criterium to stop EAE); 9, both paralyzed 791 front legs: 10, moribund or death. Detailed refinement procedures were performed according to the 792 impairments of the mice. Mice with a score of >7 were euthanized. GraphPad Prism 5 was used for 793 statistical analysis of all mouse-related data.  Table S1. Summarized information about patients in the present study 838 Table S2. Standard CSF parameters and MS disease features of patients in the present study 839 Table S3. Technical information of scRNA-seq results 840 Table S4. Merged results of the scRNA-seq analysis 841 Table S5. Gene set enrichment analysis (GSEA) results for genes differentially expressed by clusters 842         Scheme depicting the scRNA-seq analysis workflow utilized in this study. Analysis begins with 10X (10X Genomics) Cell Ranger processing and cell-level QC (quality control) metric evaluation, followed by SCONE data filtering and normalization, Seurat dimensionality reduction, clustering and visualization. Results from these analyses are input into VISION for signature calculation and consistency testing. These signatures may be used for CSEA testing. Differential abundance analysis is performed based on the Seurat clustering, and various forms of differential expression testing, including one v. all, "marker" analysis and cluster-specific case v. control analysis are performed using a metaanalysis approach that supports IDR modelling with scRAD tools. GSEA testing is used to ascribe biologic meaning to differential expression results, motivating further subclustering analysis, in which a cluster is analysed using an identical analytical procedure. Table Legends   Table S1. Summarized information about patients in the present study.

Supplementary
Clinical characteristics (average age, sex) of all control (IIH, n=22) and multiple sclerosis (MS, n=26) patients included in the study after screening are depicted. Numbers of excluded patients for each group are also shown. All included patients were divided over three cohorts, cohort 1: CSF samples used for single cell RNA-seq. (6 control vs. 6 MS), cohort 2: CSF samples analysed by flow cytometry only (7 control vs. 11 MS) and cohort 3: CSF samples flow sorted for RNA-seq of CD3 + CD4 + CXCR5 + TFH cells (9 control vs. 9 MS).   Genes most differentially expressed in clusters identified in the first clustering including all cells (All) and in the secondary CD4 + T cell clustering (CD4) are listed. Depicted are the Ensembl IDs of the Genes tested (Ensembl ID), the common Gene names (Gene Symbol), the cluster analysed (Cluster), the median log2 fold change (Median Log2 Fold Change), the irreproducible discovery rate (IDR), the statistical significance (Meta-analysis P-value) and the false discovery rate (Meta-analysis FDR).
Candidate genes were defined as reaching a threshold of either log2 fold change and IDR and FDR, or IDR and FDR, or Median Log2 Fold Change and FDR. Additionally, candidate genes from different DE analysis comparing one cluster to all others (One vs. All (Marker)) and comparing same clusters between Multiple sclerosis patients (MS) and Control patients (IIH) (MS vs. IIH (Exposure)) are depicted. Table S5. Gene set enrichment analysis (GSEA) results for genes differentially expressed by clusters.
Gene set enrichment analysis (GSEA) was performed on all marker genes for every cluster after the first clustering using all cells. Depicted are enriched Gene Sets (Signature), reference links to the GSEA data base (Origin), GSEA Enrichment Scores (EScore), statistical significance (PValue), simulated Pvalues using bonferroni correction (sim_p_bonferroni), Cluster analysed (Cluster), differential analysis type (DEType), direction of Gene set enrichment (Sign) and signature containing collections