Single cell RNA sequencing reveals ferritin as a key mediator of autoimmune pre-disposition in a mouse model of systemic lupus erythematosus

Systemic lupus erythematosus (SLE) is a devastating autoimmune disorder characterized by failure of self-tolerance with resultant production of autoreactive antibodies. The etiology of this syndrome is complex, involving perturbations in immune cell signaling and development. The NZBWF1 mouse spontaneously develops a lupus-like syndrome and has been widely used as a model of SLE for over 60 years. The NZBWF1 model represents the F1 generation of a cross between New Zealand Black (NZB) and New Zealand White (NZW) mice. In order to better understand the factors that contribute to the development of autoimmunity, single cell RNA sequencing was conducted using the bone marrow from female NZBWF1 mice prior to the development of overt disease. The results were contrasted with single cell RNA sequencing results from the two parental strains. The expected findings of B cell abundance and upregulation, and evidence of interferon signaling were validated in this model. In addition, several novel areas of inquiry were identified. Most notably, the data showed a marked upregulation of the ferritin light chain across all cell types in the NZBWF1 mice compared to parental controls. This data can serve as a gene expression atlas of all hematopoietic cells in the NZBWF1 bone marrow prior to the development of autoimmunity.

Data analysis. Data analysis was performed using R and a variety of Bioconductor packages 10-30 following many of the conventions described by Amezquita and colleagues in the book Orchestrating Single Cell Analysis with Bioconductor 31 . The code associated with this analysis is available on GitHub (https:// github. com/ styou nes/ SLE_ scRNA seq).
HDF5 (.h5) files for each sample were loaded using the read10xCounts function of the DropletUtils package. Most of the following downstream analyses were performed using a combination of the scran and scater packages. Per cell quality control metrics included total detected unique molecular identifies (UMI), number of expressed genes, and the percentage of mitochondrial genes. Cells with less than 1000 total UMIs, less than 500 unique genes, or greater than 10% expression of mitochondrial genes were removed from the dataset. There was no difference in quality of cells among batches or across strains. These metrics exhibited a bimodal distribution ( Supplementary Fig. 1A) with a clear inflection in between low-and high-quality cells, guiding the selection of thresholds. In order to ensure that highly metabolically active cells were not discarded by this approach, the total library size (i.e. total UMIs) was plotted against the percentage of mitochondrial genes for each cell (Supplementary Fig. 1B). Given the fact that there were no cells with large library sizes (presumably high-quality cells) with concomitant large proportion of mitochondrial genes, it was concluded that this approach did not inadvertently discard any metabolically active cell populations. Finally, the enrichment of gene subsets within discarded cells was checked and, observing no such enrichment, demonstrated that quality control was not discarding a distinct population of cells ( Supplementary Fig. 1C). Doublets (droplets which contained two or more cells) were identified using a simulated doublet approach whereby artificial doublets are constructed from the dataset and cells which lie close to these artificial doublets in high-dimensional space are removed (guilt by association), as implemented in the scDblFinder function with the following non-default parameters: nfeatures = 750, propRandom = 1.
Gene expression was normalized by pooling counts from related cells and calculating a size factor for each pool; cells were then deconvolved into cell-based size factors for normalization of each cell's expression profile, using the functions quickCluster and computeSumfactors as implemented in the scran package. Genes with a high variance of expression after applying a correction factor for abundance were selected for use in several downstream analyses, most notably, dimensionality reduction. The top 20% of highly variable genes were selected.
Cell types were annotated using a curated set of marker genes garnered from the Cell Marker database (http:// bio-bigda ta. hrbmu. edu. cn/ CellM arker/) (Supplementary Table 1). Equipped with these gene sets, an assignment score for each permutation of cell and cell type was computed using the function AUCell with the following nondefault parameters: aucMaxRank = 750. Diagnostic plots were inspected and score thresholds manually adjusted for optimal assignment in our dataset.
Differential gene expression among strains was determined by aggregating cell types across samples using the function aggregateAcrossCells implemented in the package edgeR. Samples with less than 10 cells for the given cell type were removed. Differential gene expression was computed by the function pseudoBulkDGE from the scran package. Briefly, this function loops across cell types and performs differential expression using www.nature.com/scientificreports/ www.nature.com/scientificreports/ a quasi-likelihood method as implemented in the package edgeR. For the design matrix NZB-NZBWF1-NZW, the coefficients were: − 0.5, 1, − 0.5. Similarly, differential abundance of cell populations was computed using quasi-likelihood method with a design matrix and coefficients as above. Phase of the cell cycle for each cell was inferred based on the expression of stereotypic cell cycle genes using the package cyclone.
Subclustering. B-cell subclusters were determined as follows. After appropriate subsetting of the global dataset, dimensionality reduction by principal component analysis was repeated. In order to minimize technical noise while maximizing identification of true biological differences, clustering was run using a subset of these principal components in keeping with the method described by Amezquita et al. 30 . The number of components was determined by first computing the number of clusters vs. the number of principal components used in clustering. Next, the number of principal components which yields no more than PC + 1 clusters was selected, as it represents the inflection point between over-and under-clustering. A k-nearest-neighbor approach was applied to the dataset as implemented in the function buildSNNGraph in the igraph package (Supplementary Fig. 2A).
Cluster stability was evaluated based on both cluster modularity and bootstrapping approaches ( Supplementary  Fig. 2B,C). One subcluster (label 1) was removed as this subcluster had a high library size and expressed both myeloid and lymphoid markers, suggesting it consisted mostly of doublets. Differential abundance and expression across B-cell subclusters was determined similar to that described above for other cell types.

Results
A high cell viability was obtained using the aforementioned isolation method, as NZB, NZW, and NZBWF1 mice had an average of 88, 94, and 94 percent viability, respectively. After initial quality control, 31,053 cells remained for downstream analysis. All three major hematopoietic lineages (immune, erythroid, and megakaryocytic) were represented across each individual strain (Fig. 1A,B). Interestingly, the gene expression of some NZBWF1 cell types-for example, erythroid and neutrophils-fell precisely in between the two parent strains (Fig. 1B), suggesting that the phenotype of the NZBWF1 strain is contributed to equally by both parent strains. By assessing the expression of stereotypic cell cycle genes, the phase of the cell cycle was assigned to each cell in our dataset (Fig. 1C). As expected, intermediate cell types (e.g. granulocyte-monocyte progenitors) exhibited progression through the cell cycle. There was no significant difference in the number of cells in each phase when compared amongst different strains (data not shown). Next, the identification of any particular cell type exhibiting differential abundance in the NZBWF1 strain as compared to the NZB or NZW strain was examined. As shown in Table 1, B-cells and plasma cells from NZBWF1 mice exhibited an approximate onefold increase in abundance, supporting the key role of auto-antibody production in the SLE-like phenotype of this strain. Given this finding, a sub-cluster analysis of this B-cell population was conducted in order to discern whether any particular B-cell subtype is enriched in the NZBWF1 strain. Based on unsupervised clustering, there were 14 subtypes of B-cells present in the dataset ( Fig. 2A). There was no significant difference in the abundance of any of these subtypes across strains (Fig. 2B). However, it is possible that the dataset may not be sufficiently powered to detect significant differences in these sub-populations. Similar to results for the entire dataset, NZBWF1 B-cells were not enriched in any given phase of the cell cycle when compared to the parent strains (data not shown).
In order to assess gene expression, a pseudo-bulk analysis used to identify genes which were differentially expressed in the NZBWF1 as compared to NZB and NZW strains within each cell type. Overall, very few genes were differentially expressed ( Table 2). For example, in B-cells only 14 genes were significantly upregulated and only 3 were downregulated. Thus, the unique auto-immune phenotype of the NZBWF1 strain may be driven by relatively small alterations in gene expression as compared to its parent strains. Given the relatively small www.nature.com/scientificreports/ number of genes differentially expressed, gene ontology and reactome analyses failed to identify any significant enrichment for gene ontology terms or reactome pathways, respectively. The specific genes which were differentially expressed were remarkably consistent across all cell types (Fig. 3, Table 3, and Supplementary Table 2). Among the most significant genes was Ftl1 and Ftl1-ps1, encoding ferritin light polypeptide 1 and ferritin light polypeptide 1 pseudogene 1, respectively. Additional genes which exhibited consistent upregulation across multiple cell types were Ifitm2, Apobec3, and Ifi202b, all genes which are regulated by the interferon pathway. Ctse, the gene encoding Cathepsin E, a protein involved in MHC class II antigen presentation, was also consistently upregulated.
Several genes of cryptic functional significance were similarly upregulated including Tpm3-rs7, Htatip2, and Gm42031. Tpm3-rs7 (tropomyosin-related sequence 7) is a heretofore uncharacterized protein with putative actin filament binding. Gm42031 is an uncharacterized locus (postulated to be a lncRNA) whose downregulation in macrophages has been previously linked to neuroinflammation 32 . However, the dataset of this highly inflammation-prone strain shows that it is upregulated. Htatip2 is an oxidoreductase with a role in nuclear import signaling. There were no genes which exhibited consistent downregulation across all cell types.
Some genes were selectively altered in only specific cell types. For example, Ly6c2 expression was upregulated across myeloid cell types. Monocytes over-expressed several complement genes namely, C1qa and C1qb. Dendritic cells over-expressed Klk1 (kallikrein). As expected, there were very few T cells in the bone marrow;

Discussion
SLE is a multifactorial syndrome which involves a complex interaction of genetic susceptibility and environmental factors/exposures. By conducting single cell RNA sequencing on the bone marrow of the SLE-prone NZBWF1 mouse strain and contrasting it to its two parent strains, neither of which develop overt autoimmunity to the degree of their progeny, this study sought to unravel immune factors which contribute to this syndrome. Importantly, this analysis was conducted prior to the development of the disease phenotype, allowing for the identification of potential predisposing factors which contribute to later autoimmunity. First, the results showed enrichment of B cells and plasma cells in the bone marrow of NZBWF1 mice. These B cells expressed genes associated with activation and inflammation while the associated plasma cells exhibited upregulation of immunoglobulin G heavy chain. Taken together, these results underscore the central role of B cell activation and antibody production to the autoimmune phenotype of NZBWF1 33,34 . A prior single cell RNA sequencing study of human peripheral blood in SLE 35 similarly identified alterations in B cells, noting enrichment of particular subpopulations of memory and activated B cells within the peripheral blood of SLE patients.
The interferon pathway is a well-known mediator of SLE in both mouse models 36 and humans [37][38][39] . The finding of interferon activation further supports a key role for this pathway in the NZBWF1 and validates its use as a small animal model of SLE. The same study cited above 35 also identified upregulation of the interferon pathway as of particular importance across a variety of cell types, including T-cells, monocytes, and dendritic cells in patients with SLE.
Of particular importance, our data showed marked upregulation of the ferritin light chain across all cell types in NZBWF1 mice relative to parental controls. Prior quantitative trait loci mapping identified a QTL associated with the SLE-like phenotype encompassing the Ftl1 locus on chromosome 7 6,8 . Playing a central role in iron uptake and storage 40 , the ferritin light chain is also a well-known acute phase reactant with important immune signaling actions 41 . Indeed, the ferritin light chain has been shown to be the primary circulating form of ferritin and mediates many of its immune functions 42 . Canonically, monocytes are the primary source of ferritin light chains in this context. Indeed, in the setting of autoimmunity, high levels of circulating ferritin are associated with the macrophage activation syndrome 43,44 . Thus, the marked upregulation across all cell types in this dataset is notable and represents a potentially novel avenue of scientific and therapeutic investigation. Whether ferritin upregulation is a cause of underlying NZBWF1 autoimmune predisposition remains to be elucidated.
Several genes of enigmatic function were also identified, suggesting potentially interesting areas of further study. Gm42031 was previously identified in a study of microglial activation. In that study, Wilson et al. identified its downregulation to be associated with neuro-inflammation 32 . Thus, the current finding of marked upregulation in the NZBWF1 strain in the context of inflammation is intriguing. Further characterization of Gm42031 and its functional impact on immune regulation may be a novel area of investigation.
To our knowledge, this is the first study to leverage single cell RNA sequencing in examining the bone marrow of the NZBWF1 mouse strain. Prior studies in this model have examined single cell transcriptomics in the kidney and lung of affected mice later in the disease course 45,46 . There, the author's identified extensive immune cell infiltration and activation, consisting of nearly all major immune cell types. Thus, our study uniquely examines the developmental milieu of these immune cells at an early time point, prior to the development of overt autoimmunity.
Our study has several limitations. First, we examined only a single time point. It is possible earlier time points would have identified more proximate genetic contributors to autoimmune predisposition in the NZBWF1 strain. Taken together, the current data represent an atlas of hematopoietic phenotype early in the course of disease development of the NZBWF1 mouse. The data are not only consistent with the B cell activation and interferon signaling expected of SLE, but also identify several novel areas of interest, most notably, the ferritin light chain signaling axis. Data sets such as the one from this study may be useful for gaining a better understanding of this devastating disease leading to better therapeutic approaches.