To define the cell populations that drive joint inflammation in rheumatoid arthritis (RA), we applied single-cell RNA sequencing (scRNA-seq), mass cytometry, bulk RNA sequencing (RNA-seq) and flow cytometry to T cells, B cells, monocytes, and fibroblasts from 51 samples of synovial tissue from patients with RA or osteoarthritis (OA). Utilizing an integrated strategy based on canonical correlation analysis of 5,265 scRNA-seq profiles, we identified 18 unique cell populations. Combining mass cytometry and transcriptomics revealed cell states expanded in RA synovia: THY1(CD90)+HLA-DRAhi sublining fibroblasts, IL1B+ pro-inflammatory monocytes, ITGAX+TBX21+ autoimmune-associated B cells and PDCD1+ peripheral helper T (TPH) cells and follicular helper T (TFH) cells. We defined distinct subsets of CD8+ T cells characterized by GZMK+, GZMB+, and GNLY+ phenotypes. We mapped inflammatory mediators to their source cell populations; for example, we attributed IL6 expression to THY1+HLA-DRAhi fibroblasts and IL1B production to pro-inflammatory monocytes. These populations are potentially key mediators of RA pathogenesis.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The single-cell RNA-seq data, bulk RNA-seq data, mass cytometry data, flow cytometry data, and the clinical and histological data for this study are available at ImmPort (https://www.immport.org/shared/study/SDY998, study accession code SDY998). The raw single-cell RNA-seq data are deposited in dbGaP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001457.v1.p1). The source code repository of the computational and statistical analysis is located at https://github.com/immunogenomics/amp_phase1_ra. Data can also be viewed on three different websites at https://immunogenomics.io/ampra, https://immunogenomics.io/cellbrowser/, and https://portals.broadinstitute.org/single_cell/study/amp-phase-1.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Gibofsky, A. Epidemiology, pathophysiology, and diagnosis of rheumatoid arthritis: A Synopsis. Am. J. Manag. Care 20, S128–S135 (2014).
McInnes, I. B. & Schett, G. The pathogenesis of rheumatoid arthritis. N. Engl. J. Med. 365, 2205–2219 (2011).
Orr, C. et al. Synovial tissue research: a state-of-the-art review. Nat. Rev. Rheumatol. 13, 463–475 (2017).
Wolfe, F. et al. The mortality of rheumatoid arthritis. Arthritis Rheum. 37, 481–494 (1994).
Namekawa, T., Wagner, U. G., Goronzy, J. J. & Weyand, C. M. Functional subsets of CD4 T cells in rheumatoid synovitis. Arthritis Rheum. 41, 2108–2116 (1998).
Gizinski, A. M. & Fox, D. A. T cell subsets and their role in the pathogenesis of rheumatic disease. Curr. Opin. Rheumatol. 26, 204–210 (2014).
Reparon-Schuijt, C. C. et al. Secretion of anti-citrulline-containing peptide antibody by B lymphocytes in rheumatoid arthritis. Arthritis Rheum. 44, 41–47 (2001).
Mulherin, D., Fitzgerald, O. & Bresnihan, B. Synovial tissue macrophage populations and articular damage in rheumatoid arthritis. Arthritis Rheum. 39, 115–124 (1996).
Kinne, R. W., Bräuer, R., Stuhlmüller, B., Palombo-Kinne, E. & Burmester, G. R. Macrophages in rheumatoid arthritis. Arthritis Res. 2, 189–202 (2000).
Müller-Ladner, U. et al. Synovial fibroblasts of patients with rheumatoid arthritis attach to and invade normal human cartilage when engrafted into SCID mice. Am. J. Pathol. 149, 1607–1615 (1996).
Pap, T., Müller-Ladner, U., Gay, R. E. & Gay, S. Fibroblast biology. Role of synovial fibroblasts in the pathogenesis of rheumatoid arthritis. Arthritis Res. 2, 361–367 (2000).
Dennis, G. et al. Synovial phenotypes in rheumatoid arthritis correlate with response to biologic therapeutics. Arthritis Res. Ther. 16, R90 (2014).
Orange, D. E. et al. Identification of three rheumatoid arthritis disease subtypes by machine learning integration of synovial histologic features and RNA sequencing data. Arthritis Rheumatol. 70, 690–701 (2018).
Lindberg, J. et al. Variability in synovial inflammation in rheumatoid arthritis investigated by microarray technology. Arthritis Res. Ther. 8, R47 (2006).
Stephenson, W. et al. Single-cell RNA-seq of rheumatoid arthritis synovial tissue using low-cost microfluidic instrumentation. Nat. Commun. 9, 791 (2018).
Papalexi, E. & Satija, R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat. Rev. Immunol. 18, 35–45 (2018).
Schelker, M. et al. Estimation of immune cell content in tumour tissue using single-cell RNA-seq data. Nat. Commun. 8, 2032 (2017).
Rao, D. A. et al. Pathologically expanded peripheral T helper cell subset drives B cells in rheumatoid arthritis. Nature 542, 110–114 (2017).
Fonseka, C. Y. et al. Mixed-effects association of single cells identifies an expanded effector CD4+ T cell subset in rheumatoid arthritis. Sci. Transl. Med. 10, eaaq0305 (2018).
Villani, A. -C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors.Science 356, eaah4573 (2017).
Mizoguchi, F. et al. Functionally distinct disease-associated fibroblast subsets in rheumatoid arthritis. Nat. Commun. 9, 789 (2018).
Donlin, L. T. et al. Methods for high-dimensional analysis of cells dissociated from cryopreserved synovial tissue. Arthritis Res. Ther. 20, 139 (2018).
Becher, B. et al. High-dimensional analysis of the murine myeloid cell system. Nat. Immunol. 15, 1181–1189 (2014).
De Maesschalck, R., Jouan-Rimbaud, D. & Massart, D. L. The Mahalanobis distance. Chemometrics Intellig. Lab. Syst. 50, 1–18 (2000).
Krenn, V. et al. Grading of chronic synovitis—a histopathological grading system for molecular and diagnostic pathology. Pathol. Res. Pract. 198, 317–325 (2002).
Maaten, Lvander & Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Todd, D. J. et al. XBP1 governs late events in plasma cell differentiation and is not required for antigen-specific memory B cell development. J. Exp. Med. 206, 2151–2159 (2009).
Rubtsov, A. V. et al. CD11c-expressing B cells are located at the T cell/B cell border in spleen and are potent APCs. J. Immunol. 195, 71–79 (2015).
Pillai, S. Now you know your ABCs. Blood 118, 1187–1188 (2011).
Ellebedy, A. H. et al. Defining antigen-specific plasmablast and memory B cell subsets in human blood after viral infection or vaccination. Nat. Immunol. 17, 1226–1234 (2016).
Wang, S. et al. IL-21 drives expansion and plasma cell differentiation of autoreactive CD11chi T-bet+ B cells in SLE. Nat. Commun. 9, 1758 (2018).
Pitzalis, C., Kelly, S. & Humby, F. New learnings on the pathophysiology of RA from synovial biopsies. Curr. Opin. Rheumatol. 25, 334–344 (2013).
Filer, A. The fibroblast as a therapeutic target in rheumatoid arthritis. Curr. Opin. Pharmacol. 13, 413–419 (2013).
Westra, H.-J. et al. Fine-mapping and functional studies highlight potential causal variants for rheumatoid arthritis and type 1 diabetes. Nat. Genet. 50, 1366–1374 (2018).
Al-Mayouf, S. M. et al. Loss-of-function variant in DNASE1L3 causes a familial form of systemic lupus erythematosus. Nat. Genet. 43, 1186–1188 (2011).
Snelling, S. J. B. et al. Dickkopf-3 is upregulated in osteoarthritis and has a chondroprotective role. Osteoarthritis Cartilage 24, 883–891 (2016).
Lee, E. B. et al. Tofacitinib versus methotrexate in rheumatoid arthritis. N. Engl. J. Med. 370, 2377–2386 (2014).
Zizzo, G., Hilliard, B. A., Monestier, M. & Cohen, P. L. Efficient clearance of early apoptotic cells by human macrophages requires M2c polarization and MerTK induction. J. Immunol. 189, 3508–3520 (2012).
Frara, N. et al. Transgenic expression of osteoactivin/gpnmb enhances bone formation in vivo and osteoprogenitor differentiation ex vivo. J. Cell. Physiol. 231, 72–83 (2016).
Jenks, S. A. et al. Distinct effector B cells induced by unregulated toll-like receptor 7 contribute to pathogenic responses in systemic lupus erythematosus. Immunity 49, 725–739.e6 (2018).
Smolen, J. S. How well can we compare different biologic agents for RA? Nat. Rev. Rheumatol. 6, 247–248 (2010).
McInnes, I. B. et al. The role of interleukin-15 in T-cell migration and activation in rheumatoid arthritis. Nat. Med. 2, 175 (1996).
McInnes, I. B. & Liew, F. Y. Cytokine networks—towards new therapies for rheumatoid arthritis. Nat. Clin. Pract. Rheumatol. 1, 31 (2005).
Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2018).
Parkhomenko, E., Tritchler, D. & Beyene, J. Sparse canonical correlation analysis with application to genomic data integration. Stat. Appl. Genet. Mol. Biol. 8, 1 (2009).
Witten, D. M., Tibshirani, R. & Hastie, T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009).
Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. 17, 77 (2016).
Bendall, S. C. et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011).
Bjornson, Z. B., Nolan, G. P. & Fantl, W. J. Single-cell mass cytometry for analysis of immune system functional states. Curr. Opin. Immunol. 25, 484–494 (2013).
Peterson, V. M. et al. Multiplexed quantification of proteins and transcripts in single cells. Nat. Biotechnol. 35, 936–939 (2017).
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
Finck, R. et al. Normalization of mass cytometry data with bead standards. Cytometry A 83, 483–494 (2013).
González, I., Déjean, S., Martin, P. & Baccini, A. CCA: an R package to extend canonical correlation analysis. J. Stat. Software 23, 1–14 (2008).
Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102, 15545–15550 (2005).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Liberzon, A. et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Reynolds, A. P., Richards, G., De La Iglesia, B. & Rayward-Smith, V. J. Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. J. Math. Model. Algorithms 5, 475–504 (2006).
Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
This work was supported by the Accelerating Medicines Partnership (AMP) in Rheumatoid Arthritis and Lupus Network. AMP is a public-private partnership (AbbVie Inc., Arthritis Foundation, Bristol-Myers Squibb Company, Lupus Foundation of America, Lupus Research Alliance, Merck Sharp & Dohme Corp., National Institute of Allergy and Infectious Diseases, National Institute of Arthritis and Musculoskeletal and Skin Diseases, Pfizer Inc., Rheumatology Research Foundation, Sanofi and Takeda Pharmaceuticals International, Inc.) created to develop new ways of identifying and validating promising biological targets for diagnostics and drug development. Funding was provided through grants from the National Institutes of Health (UH2-AR067676, UH2-AR067677, UH2-AR067679, UH2-AR067681, UH2-AR067685, UH2-AR067688, UH2-AR067689, UH2-AR067690, UH2-AR067691, UH2-AR067694, and UM2-AR067678). K.S. is supported by the Ruth L. Kirschstein National Research Service Award (NIAMS F31AR070582). K.W. is supported by a Rheumatology Research Foundation Scientist Development Award, and KL2/Catalyst Medical Research Investigator Training award (an appointed KL2 award) from Harvard Catalyst, The Harvard Clinical and Translational Science Center (National Center for Advancing Translational Sciences, National Institutes of Health Award KL2 TR002542). D.A.R. is supported by NIAMS K08 AR072791-01. L.T.D. is supported by NIAMS K01 AR066063. J.H.A. is supported by R21 AR071670, and the Bertha and Louis Weinstein research fund. S.R. is supported by NIAMS 1R01AR063759-01A1, NIAID U19 AI111224, NHGRI U01 HG009379, and Doris Duke Charitable Foundation Grant #2013097. A.H.J. is supported by an Arthritis National Research Foundation Grant. A.F., C.D.B. and J.D.T. were supported by the Arthritis Research UK Rheumatoid Arthritis (#20298), and by the National Institute for Health Research (NIHR)’s Birmingham Biomedical Research Centre program, supported by the National Institute for Health Research/Wellcome Trust Clinical Research Facility at University Hospitals Birmingham NHS Foundation Trust.
Integrated supplementary information
Supplementary Figure 1 Flow cytometry gating scheme and a data-driven approach to separate samples on the basis of flow cytometry data.
a, Flow cytometry gating: stromal fibroblasts (CD45–PDPN+), monocytes (CD45+CD14+), T cells (CD45+CD3+), and B cells (CD45+CD3–CD19+). b, As a percentage of live cells: synovial T cells, B cells, and monocytes for OA-arthro (OA arthroplasty), RA-arthro (RA arthroplasty), and RA-biopsy (RA biopsy) by flow cytometry. c, Comparison of lymphocytes (T and B cells) and monocytes, as a percentage of live cells by flow cytometry. d, Mahalanobis distance from OA samples. Each dot represents a donor. Each panel highlights the contribution of T cells, B cells, and monocytes to the distance (y axis). We defined leukocyte-rich RA samples as those with Mahalanobis distance from OA greater than 4.5 (dashed line). We identified 19 leukocyte-rich RA, 17 leukocyte-poor RA, and 15 OA samples in our cohort.
Supplementary Figure 2 scRNA-seq analysis pipeline and distribution of identified subsets by scRNA-seq, flow cytometry, and protein fluorescence on each cell.
a, CCA-based integrative pipeline of scRNA-seq analysis. (1) We first select the highly variable genes from both scRNA-seq and bulk RNA-seq; (2) we integrate single cells with bulk samples based on the selected genes from both sides and learn a linear projection that the correlation between both sides are maximized using CCA; (3) we then calculate a cell-to-cell similarity matrix based on the top 10 canonical variates from CCA; (4) we build a k-nearest neighbors (KNN) network on the cell-to-cell similarity matrix and then convert it into an adjacency matrix; (5) we cluster the cells using the Infomap community detection algorithm to identify major groups on the cell-to-cell adjacency matrix; (6) we visualize the cells with tSNE; (7) we perform differential gene expression analysis on the identified cell type clusters and report three statistics: AUC, percent of non-zero expressing cells, and fold change; (8) finally, we perform gene set enrichment analysis to find pathways associated with each identified cell cluster. b, Silhouette analysis of 18 scRNA-seq clusters. The measure range is (−1, 1), where a value near 1 indicates a cell is far from neighboring clusters, a value near 0 indicates a cell is near a decision boundary, and a negative value indicates a cell is closer to a neighboring cluster. The features of the boxes are as follows. The box represents the 25th and 75th quantiles. The center line represents the 50th quantile. The low whisker is the lowest value greater than −1.5× the interquartile range plus the 25th quantile. The high whisker is the greatest value less than 1.5× the interquartile range plus the 75th quantile. The points are values outside the range of the whiskers. c, Number of cells for each cell type cluster across donors. d, Cellular composition of major synovial cell types for each donor by flow cytometry. e, Flow cytometry protein fluorescence of cell type markers on each single cell: PDPN, THY1 (CD90), CD45, CD19, CD14, and CD3.
a, Protein markers of synovial cells (3,000 downsampled) from all donors by mass cytometry. Color represents intensity of expression level. b–e, Distribution of identified subpopulations for each cell type. f, Cell counts of all clusters by comparing all the 26 donors reveal that leukocyte-rich donors show high cell abundance of HLA-DR+ fibroblasts (THY1+ CD34– HLA-DR+ and THY1+ CD34+ HLA-DR+), TPH cells (CD4+ PD-1+ ICOS+), two CD14+ monocytes subpopulations (CD11c+ CCR2+ and CD11c+ CD38+), and a B cell subpopulation (IgM+ IgD+ CD11c+).
Supplementary Figure 4 Comparison of CCA-based clustering and PCA-based clustering on batch effect correction performance.
a, Cells colored by 18 scRNA-seq clusters (top), 24 384-well plates (middle), and 21 donors (bottom) using the CCA-based integrative pipeline. b, Cells colored by scRNA-seq clusters (top), plates (middle), and donors (bottom) using PCA-based clustering by Seurat R package. Small clusters of cells from single donors are circled.
a, Correlation between cell density (cell counts per 200× field) and flow cytometric cell yields on B cells. b, Correlation between cell density (cell counts per 200× field) and flow cytometric cell yields on T cells. In general, we observed that the samples where we get the most single cell measurements are exactly the samples with the best yield and also the ones with the most inflammation.
To identify and subset the immune and stromal populations that emerged from the scRNA-seq analyses, we sorted synovial cell subsets and disaggregated synovial tissues based on markers revealed by the scRNA-seq analysis. a, Flow gating strategy for synovial fibroblasts. b, Flow gating strategy for synovial B cells. c, Flow gating strategy for synovial monocytes. See Methods for more details.
Supplementary Figure 7 Bulk RNA-seq analysis for flow sorted subpopulations of synovial fibroblasts, monocytes, and B cells to validate identified scRNA-seq clusters.
For each cell type, we trained a linear discriminant analysis (LDA) model on the scRNA-seq clusters. Next, we applied this LDA model to classify each bulk RNA-seq sample (Supplementary Fig. 6). After discovering scRNA-seq cluster markers (top 500 genes sorted by AUC for each cluster), we wanted to test if we could sort new cells from independent samples and see the same gene expression profiles in the new bulk samples as the original scRNA-seq samples. a, LDA projection of training data on single-cell fibroblasts (SC-F1-4). b, LDA projection of bulk RNA-seq samples that include sorted THY1-DR- populations from 4 OA, THY1+DR- population from 4 OA and 6 RA, and THY1+DR+ population from 6 RA samples. c, Posterior probabilities showing confidence of assigning each sorted fibroblast bulk sample to fibroblast scRNA-seq clusters. d, Genes (top 10 by Z-score) that are differentially expressed between two scRNA-seq clusters (SC-F2 and SC-F4) are also differentially expressed in the sorted bulk RNA-seq. e, LDA projection of training data on single-cell monocytes (SC-M1-4). f, LDA projection of bulk RNA-seq samples that include sorted CD14+CD11c+++CD38+++ population from 2 RA and CD14+CD11c+CD38– population from 2 OA. g, Posterior probabilities showing confidence of assigning each sorted monocyte bulk sample to monocyte scRNA-seq clusters. h. Genes (top 10 by Z-score) that are differentially expressed between two scRNA-seq clusters (SC-M1 and SC-M2) are also differentially expressed in the sorted bulk RNA-seq. i, LDA projection of training data on single-cell B cells (SC-B1-4). j, LDA projection of bulk RNA-seq samples that include sorted CD11c–IgD–CD27+ population from 6 RA, CD11c–IgD+CD27– population from 3 RA, CD19+CD11c+ population from 3 RA, and plasma cells from 3 RA. k, Posterior probabilities showing confidence of assigning each sorted B cell bulk sample to B cell scRNA-seq clusters. l, Genes (top ten by z score) that are differentially expressed between two scRNA-seq clusters (SC-B3 and SC-B4) are also differentially expressed in the sorted bulk RNA-seq.
a, Enriched pathways on each monocyte cluster by scRNA-seq. b, Enriched pathways on each B cell cluster by scRNA-seq. c, Identified Treg (SC-T2) and TPH and TFH (SC-T3) scRNA-seq clusters. We used hierarchical clustering with R functions hclust() and cutree(k = 2) to pinpoint the previously characterized rare cell populations.
Supplementary Figure 9 Multicolor immunofluorescent staining of paraffin synovial tissue from target RA and OA patient samples.
We performed multi-color immunofluorescence images to validate the unique cellular and molecular signatures revealed by the scRNA-seq analysis and show the contrasting cellular and molecular features of the microenvironment in the inflamed RA and OA synovia. a, Numerous CD14+IL-1β+ colocalization in inflamed RA synovium tissue (denoted by white box). b, Very few CD14+IL-1β+ cells in the OA synovium. c, Numerous CD3+CD8+IFNγ+ T cells in RA biopsy (denoted by white box). d, Low numbers of CD3+CD8+IFNγ+ T cells in the synovium of OA. e, An increased number of CD20+T-bet+CD11c+ B cells in the inflamed RA synovium (denoted by white box). f, Very scared B cells and specially T-bet+CD20+ B cells in synovial tissues of OA. Globally, we found enrichment for the populations of interest in the biopsies of RA inflamed synovium, compared to specific populations in OA synovia. Images were acquired at ×200 magnification. Scale bar, 100 μm.
a, RA synovial tissue samples were disaggregated, stained for surface markers and intracellular granzyme B (GZMB) and granzyme K (GZMK), and analyzed by flow cytometry. Shown are plots of GZMB versus GZMK expression by CD8 T cells from three representative tissue specimens out of eight total tissues analyzed. b, GZMK and GZMB expression patterns by HLA-DR+ CD8 T cells. c, IFNγ production by CD4 and CD8 T cells from RA synovial tissue, measured by intracellular flow cytometry after stimulation with PMA/ionomycin. Cells from the same synovial tissue sample are connected by a line. (one–tailed Student’s t test P = 0.028, t value = 2.1, df = 10.94). d, TNF production by CD4 and CD8 T cells from RA synovial tissue.
a, Quality control of bulk RNA-seq samples. Common genes are defined as the set of genes detected with at least 1 mapped fragment in 95% of the samples (13,041 genes). X axis is the number of cells for each bulk RNA-seq sample. Y axis is the percentage of detected common genes for each sample. We discarded 25 low quality samples that have less than 99% (dashed line) of common genes detected, resulting 167 post–QC samples in all. b, PCA analysis on all the post–QC samples shows that most of the variance in the bulk RNA-seq data is due to cell type. c, Cell type marker genes show that there is no obvious contamination in the bulk RNA-seq data. d–g, PCA analysis on samples from each cell type. The samples from leukocyte-rich RA appear distinct from leukocyte-poor RA and OA samples. This difference in transcriptional signatures in inflamed tissue is largely determined by altered cellular composition. h,i, Distribution of significantly enriched GO terms in leukocyte-rich RA by GSEA. Leukocyte-rich fibroblasts and monocytes share the common pathways of Type I interferon and inflammatory response. j, Correlation between bulk RNA-seq genes and immune cell type abundances in RA synovial fibroblasts. Integrating bulk RNA-seq samples from fibroblasts with multiple cell type flow gates reveals that T cells, B cells, and monocytes that are abundant in RA synovial tissue directly influence the expression of fibroblasts in the RA synovium.
Supplementary Figure 12 Correlation between bulk RNA-seq expression and proportion of non-zero expressing cells on scRNA-seq cluster markers.
The expression level of a gene in a bulk RNA-seq sample can indicate the abundance of cells expressing that gene in the bulk sample. In other words, it is possible to infer the abundance of some cellular subpopulations from bulk RNA-seq data. We depict two marker genes per scRNA-seq cluster and show the bulk RNA-seq expression (x axis) is correlated with the percent of non-zero expressing cells over the total number of cells (y axis) for the overlapped a, fibroblast samples; b, monocyte samples; c, B cell samples; and d, T cell samples. Pearson’s R squared and P value are given using the limma R package for each correlation scenario. The gray zone represents 95% confidence level interval for predictions from a linear model.
Supplementary Figure 13 Correlation between mean proteomic expression by mass cytometry and transcriptomic expression by bulk RNA-seq on the overlapped samples.
Two typical protein/gene markers per cell type were shown for a, fibroblast samples on THY1 and CD34; b, monocyte samples on CD38 and CCR2; c, B cell samples on CD11c and IgD; and d, T cell samples on CD8a and PDCD1. Pearson’s R squared and P value are given using the limma R package for each correlation scenario. The gray zone represents 95% confidence level interval for predictions from a linear model.
We selected two marker genes expected to be exclusively expressed in each of the four cell types: PDGFRA and ISLR for fibroblasts, CD2 and CD3D for T cells, CD79A and RALGPS2 for B cells, and CD14 and C1QA for monocytes. We counted nonzero expression of these genes in the correct cell type as a true positive and nonzero expression in the incorrect cell type as a false positive. a, Estimated optimal thresholds for reads per unique molecular identifier (UMI) shown for three example quadrants (Q2, Q1, Q4) of three scRNA-seq plates (S006, S008, S011). The reader per UMI threshold determines which UMIs we discard. By discarding UMIs with few supporting reads, we can reduce the false positive rate (FPR), but this comes at the cost of also reducing the true positive rate (TPR). b, After selecting the optimal threshold that maximizes the ratio of TPR to FPR for each quadrant, we visualize the FPR and TPR for each quadrant. We noticed that some quadrants had poor performance even after optimizing the reads per UMI, so we discarded the cells from the red outlier quadrants. c, An example of gene expression for MMP2 across all cells before and after filtering the reads per UMI threshold. This matrix metalloproteinase is known to be expressed by fibroblasts, but not by other cell types, so we did not expect to see MMP2 expression in cell types other than fibroblasts. After applying the reads per UMI threshold to discard UMIs that are probably contaminants, MMP2 gene expression is closer to our a priori expectation: very few B cells, monocytes, and T cells express this gene. Note that we did not use MMP2 to determine the optimal reads per UMI threshold.
We excluded cells with less than 1,000 genes detected (at least 1 read). We also excluded cells with more than 25% of UMIs coming from 32 mitochondrial genes. This figure shows the post-QC single cells used in our scRNA-seq analyses.