Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus


Responses to vaccination and to diseases vary widely across individuals, which may be partly due to baseline immune variations. Identifying such baseline predictors of immune responses and their biological basis is of broad interest, given their potential importance for cancer immunotherapy, disease outcomes, vaccination and infection responses. Here we uncover baseline blood transcriptional signatures predictive of antibody responses to both influenza and yellow fever vaccinations in healthy subjects. These same signatures evaluated at clinical quiescence are correlated with disease activity in patients with systemic lupus erythematosus with plasmablast-associated flares. CITE-seq profiling of 82 surface proteins and transcriptomes of 53,201 single cells from healthy high and low influenza vaccination responders revealed that our signatures reflect the extent of activation in a plasmacytoid dendritic cell–type I IFN–T/B lymphocyte network. Our findings raise the prospect that modulating such immune baseline states may improve vaccine responsiveness and mitigate undesirable autoimmune disease activity.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Study questions and the derivation of a baseline, pre-vaccination signature predictive of response using an influenza vaccination cohort.
Fig. 2: Assessing TGSig in independent influenza vaccination, yellow fever vaccination and SLE datasets.
Fig. 3: A transcriptional correlate of plasmablast-associated disease activity in SLE is also associated with influenza vaccination responses and functionally related to TGSig.
Fig. 4: CITE-seq (simultaneous protein and transcriptome expression profiling in single cells) analysis of high and low influenza vaccine responders.
Fig. 5: Dissecting the cellular origin of baseline signatures.
Fig. 6: The correlative relationship among the single-cell-cluster-based and bulk (PBMC) signatures and a proposed cellular circuit whose pre-perturbation status determines future responsiveness.

Data availability

All data used in this study, including CITE-seq and flow data, are available at figshare ( The original/raw public gene expression data are available in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) under accession numbers: GSE47353, GSE41080, GSE59654, GSE59743, GSE29619, GSE74817, GSE13486 and GSE65391 (see also Supplementary Table 2).

Code availability

The source code and software pipeline to reproduce our analyses can be assessed at


  1. 1.

    Kotas, M. E. & Medzhitov, R. Homeostasis, inflammation, and disease susceptibility. Cell 160, 816–827 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Chen, D. S. & Mellman, I. Elements of cancer immunity and the cancer-immune set point. Nature 541, 321–330 (2017).

    CAS  PubMed  Google Scholar 

  3. 3.

    Tsang, J. S. Utilizing population variation, vaccination, and systems biology to study human immunology. Trends Immunol. 36, 479–493 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Willis, J. C. D. & Lord, G. M. Immune biomarkers: the promises and pitfalls of personalized medicine. Nat. Rev. Immunol. 15, 323 (2015).

    CAS  PubMed  Google Scholar 

  5. 5.

    Brodin, P. & Davis, M. M. Human immune system variation. Nat. Rev. Immunol. 17, 21–29 (2017).

    CAS  PubMed  Google Scholar 

  6. 6.

    Hagan, T. et al. Antibiotics-driven gut microbiome perturbation alters immunity to vaccines in humans. Cell 178, 1313–1328.e13 (2019).

    CAS  PubMed  Google Scholar 

  7. 7.

    Brodin, P. et al. Variation in the human immune system is largely driven by non-heritable influences. Cell 160, 37–47 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Marchant, A. et al. Predominant influence of environmental determinants on the persistence and avidity maturation of antibody responses to vaccines in infants. J. Infect. Dis. 193, 1598–1605 (2006).

    PubMed  Google Scholar 

  9. 9.

    Krieg, C. et al. High-dimensional single-cell analysis predicts response to anti-PD-1 immunotherapy. Nat. Med. 24, 144 (2018).

    CAS  PubMed  Google Scholar 

  10. 10.

    HIPC-CHI Signatures Project Team & HIPC-I Consortium. Multicohort analysis reveals baseline transcriptional predictors of influenza vaccination responses. Sci. Immunol. 2, eaal4656 (2017).

  11. 11.

    Fourati, S. et al. Pre-vaccination inflammation and B-cell signalling predict age-related hyporesponse to hepatitis B vaccination. Nat. Commun. 7, 10369 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Tsang, J. S. et al. Global analyses of human immune variation reveal baseline predictors of postvaccination responses. Cell 157, 499–513 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Banchereau, R. et al. Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell 165, 551–565 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Tsokos, G. C., Lo, M. S., Reis, P. C. & Sullivan, K. E. New insights into the immunopathogenesis of systemic lupus erythematosus. Nat. Rev. Rheumatol. 12, 716–730 (2016).

    CAS  PubMed  Google Scholar 

  15. 15.

    Lee, J. C. & Smith, K. G. C. Prognosis in autoimmune and infectious disease: new insights from genetics. Clin. Transl. Immunol. 3, e15 (2014).

    Google Scholar 

  16. 16.

    Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Lu, Y. et al. Systematic analysis of cell-to-cell expression variation of T lymphocytes in a human cohort identifies aging and genetic associations. Immunity 45, 1162–1175 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Nakaya, H. I. et al. Systems analysis of immunity to influenza vaccination across multiple years and in diverse populations reveals shared molecular signatures. Immunity 43, 1186–1198 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Querec, T. D. et al. Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans. Nat. Immunol. 10, 116–125 (2009).

    CAS  PubMed  Google Scholar 

  20. 20.

    Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 9, 559 (2008).

    Google Scholar 

  21. 21.

    Li, S. et al. Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nat. Immunol. 15, 195–204 (2014).

    PubMed  Google Scholar 

  22. 22.

    Muskardin, T. L. W. & Niewold, T. B. Type I interferon in rheumatic diseases. Nat. Rev. Rheumatol. 14, 214–228 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Swiecki, M. & Colonna, M. The multifaceted biology of plasmacytoid dendritic cells. Nat. Rev. Immunol. 15, 471–485 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Jego, G. et al. Plasmacytoid dendritic cells induce plasma cell differentiation through type I interferon and interleukin 6. Immunity 19, 225–234 (2003).

    CAS  PubMed  Google Scholar 

  26. 26.

    Gricks, C. S. et al. Differential regulation of gene expression following CD40 activation of leukemic compared to healthy B cells. Blood 104, 4002–4009 (2004).

    CAS  PubMed  Google Scholar 

  27. 27.

    Shimabukuro-Vornhagen, A. et al. Inhibition of protein geranylgeranylation specifically interferes with CD40-dependent B cell activation, resulting in a reduced capacity to induce T cell immunity. J. Immunol. 193, 5294–5305 (2014).

    CAS  PubMed  Google Scholar 

  28. 28.

    Elgueta, R. et al. Molecular mechanism and function of CD40/CD40L engagement in the immune system. Immunol. Rev. 229, 152–172 (2009).

    CAS  PubMed  Google Scholar 

  29. 29.

    Arriens, C., Wren, J. D., Munroe, M. E. & Mohan, C. Systemic lupus erythematosus biomarkers: the challenging quest. Rheumatology 56, i32–i45 (2017).

    CAS  PubMed  Google Scholar 

  30. 30.

    Verheul, M. K., Fearon, U., Trouw, L. A. & Veale, D. J. Biomarkers for rheumatoid and psoriatic arthritis. Clin. Immunol. 161, 2–10 (2015).

    CAS  PubMed  Google Scholar 

  31. 31.

    Banchereau, J. & Pascual, V. Type I interferon in systemic lupus erythematosus and other autoimmune diseases. Immunity 25, 383–392 (2006).

    CAS  PubMed  Google Scholar 

  32. 32.

    Quách, T. D. et al. Distinctions among circulating antibody-secreting cell populations, including B-1 cells, in human adult peripheral blood. J. Immunol. 196, 1060–1069 (2016).

    PubMed  PubMed Central  Google Scholar 

  33. 33.

    Andrews, S. F. et al. High preexisting serological antibody levels correlate with diversification of the influenza vaccine response. J. Virol. 89, 3308–3317 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    He, X.-S. et al. Baseline levels of influenza-specific CD4 memory T-cells affect T-cell responses to influenza vaccines. PLoS One 3, e2574 (2008).

    PubMed  PubMed Central  Google Scholar 

  35. 35.

    Patin, E. et al. Natural variation in the parameters of innate immune cells is preferentially driven by genetic factors. Nat. Immunol. 19, 302–314 (2018).

    CAS  PubMed  Google Scholar 

  36. 36.

    Westra, H. J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16, 626–638 (2016).

    CAS  PubMed  Google Scholar 

  38. 38.

    Rees, F., Doherty, M., Grainge, M. J., Lanyon, P. & Zhang, W. The worldwide incidence and prevalence of systemic lupus erythematosus: a systematic review of epidemiological studies. Rheumatology 56, 1945–1961 (2017).

    PubMed  Google Scholar 

  39. 39.

    Zyla, J. et al. Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics 35, 5146–5154 (2019).

    PubMed  PubMed Central  Google Scholar 

  40. 40.

    Korotkevich, G., Sukhov, V. & Sergushichev, A. Fast gene set enrichment analysis. Preprint at bioRxiv (2019).

  41. 41.

    van Lochem, E. G. et al. Immunophenotypic differentiation patterns of normal hematopoiesis in human bone marrow: reference patterns for age-related changes and disease-induced shifts. Cytometry B Clin. Cytom. 60, 1–13 (2004).

    PubMed  Google Scholar 

  42. 42.

    Finak, G. et al. ImmuneSpaceR: A Thin Wrapper around the ImmuneSpace Database. (2014).

  43. 43.

    Bhattacharya, S. et al. ImmPort: disseminating data to the public for the future of immunology. Immunol. Res. 58, 234–239 (2014).

    CAS  PubMed  Google Scholar 

  44. 44.

    Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf. 12, 77 (2011).

    Google Scholar 

  45. 45.

    Bates, D., Bolker, B. M., Maechler, M. & Walker, S. C. Fitting linear mixed-effects models using lme4. J. Stat. Softw. (2015).

  46. 46.

    Chaussabel, D. et al. A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus. Immunity 29, 150–164 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Weiner, J. tmod: Feature Set Enrichment Analysis for Metabolomics and Transcriptomics. R package version 0.40 (2018);

  48. 48.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).

    Google Scholar 

  49. 49.

    Choi, J. K., Yu, U., Kim, S. & Yoo, O. J. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19 (Suppl. 1), i84–i90 (2003).

    PubMed  Google Scholar 

  50. 50.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  PubMed  Google Scholar 

  51. 51.

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  PubMed  Google Scholar 

  52. 52.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    PubMed  PubMed Central  Google Scholar 

  53. 53.

    Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).

  54. 54.

    Torchiano, M. effsize: Efficient Effect Size Computation. R package version 0.7.6 (2019);

  55. 55.

    Wang, X. et al. An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection. Bioinformatics 28, 2534–2536 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Seshan, V. clinfun: Clinical Trial Design and Data Analysis Functions. R package version 1.0.15 (2018);

  57. 57.

    Yan, Y. MLmetrics: Machine Learning Evaluation Metrics. R package version 1.1.1 (2016);

  58. 58.

    Stoeckius, M. et al. Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol. 19, 224 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Stoeckius, M. et al. CITE-seq protocols (2017);

  60. 60.

    Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).

    CAS  PubMed  Google Scholar 

  61. 61.

    Roelli, P. et al. Hoohm/CITE-seq-Count: 1.4.2. (Zenodo, 2019);

  62. 62.

    Lun, A. T. L. & Risso, D. SingleCellExperiment: S4 Classes for Single Cell Data. R package version 1.4.1. (2019);

  63. 63.

    Mule, M. P., Martins, A. J. & Tsang, J. S. Normalizing and denoising protein expression data from droplet-based single cell profiling. bioRxiv (in the press).

  64. 64.

    Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

    PubMed  PubMed Central  Google Scholar 

  65. 65.

    Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).

    PubMed  Google Scholar 

  66. 66.

    Waltman, L. & Eck, N. Jvan A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86, 471 (2013).

    Google Scholar 

  67. 67.

    Zappia, L. & Oshlack, A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. GigaScience 7, giy083 (2018).

    PubMed Central  Google Scholar 

  68. 68.

    Pederson, T. L. ggraph: An Implementation of Grammar of Graphics for Graphs and Networks. R package version 1.0.2. (2018);

  69. 69.

    Carlson, M. hgu95av2.db: Affymetrix Human Genome U95 Set annotation data (chip hgu95av2). R package version 3.2.3. (2016);

  70. 70.

    Chen, J., Bardes, E. E., Aronow, B. J. & Jegga, A. G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank members of the Tsang Lab and the NIH Center for Human Immunology (CHI) for inputs and discussions; L. Failla for help with figures; D. Hirsch for testing the R code; and R. Germain and R. Apps for comments on the manuscript. This research was supported by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases (NIAID) and Intramural Programs of the NIH Institutes supporting the Center for Human Immunology. Figure 6 was created with

Author information




J.S.T conceived and designed the study; R.S. and Y.K. collected and evaluated public datasets; Y.K., Y.L., A.J.M, M.P.M. and J.S.T designed data analysis strategies with inputs from R.S.; Y.K., Y.L., M.P.M. and A.J.M. performed data analysis with help from R.S., C.C.L. and F.C.; J.C. and P.L.S. contributed samples for CITE-seq; A.J.M, M.P.M. and N.B. developed protocols and generated CITE-seq data; M.G. and A.J.M. analyzed flow cytometry data prepared by A.B.; R.B. and V.P. provided SLE data; L.K. and S.M. sorted cells from healthy donors for RNA-seq analysis performed by A.J.M.; J.S.T. and R.S. wrote the manuscript with contributions from Y.K.; A.J.M., M.P.M. and Y.L. are co-second authors and contributed equally; J.S.T. supervised the study; all authors approved the manuscript.

Corresponding author

Correspondence to John S. Tsang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Saheli Sadanand was the primary editor on this article and managed its editorial process in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Identification and characterization of the CD19+CD20+CD38++B cell population, a baseline, pre-vaccination cell frequency-based signature (CBSig) of antibody responses to influenza vaccination.

a, Flow cytometric gating strategy for the CD19+CD20+CD38++ B cell population. Populations 1–4 are further described in (Fig. 1b). b, Box plots (top) showing the frequency of CD19+CD20+CD38++ cells (CBSig; y axis) at the three baseline time points from ref. 12 (days −7 and 0 are prior to vaccination and day 70 is after vaccination) in low and high responders (x axis) to the seasonal and pandemic H1N1 influenza vaccines as defined by the Adjusted Maximum Fold Change (adjMFC) metric (see ref. 12). There are 11 low and 12 high responders for day −7 and 0, and 10 low and 11 high responders for day 70. P values from the Wilcoxon one-tailed test results are shown on the box plots (based on results from ref. 12 our hypothesis was that the high responders have higher frequencies of these cells than low responders). Boxplots’ center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. (Bottom) Corresponding receiver operator curves (ROC) for vaccine response at each of the above baseline time points and their AUC (area under the curve) and corresponding permutation-based one-tailed p value are shown. c, Dot plots (CD38 vs. CD10 of CD19+CD20+B cells) for example high and low responders. d, Glossary of major abbreviations used in the study.

Extended Data Fig. 2 Derivation of TGSig, a transcriptional surrogate signature for CBSig.

a, Step 1: Identification of genes with high temporal stability across the three baseline time points (days −7 and 0 are prior to vaccination and day 70 is after vaccination) in the NIH influenza study12. The middle box shows the distribution of the temporal stability metric (TSM) across all the genes. The boxes on the right and left show examples of genes with high and low temporal stability, respectively; each line corresponds to an individual. Genes with high temporal stability (≥0.75) across the three baseline time points (depicted to the right of the red dashed line) were subsequently evaluated for correlation with CD19+CD20+CD38++ B cell frequency. b, Step 2: 726 temporally stable genes were ranked by their ‘robust’ correlation with CD19+CD20+CD38++ B cell frequency. Robustness is evaluated using all 231 random samplings of 20 subjects out of the cohort of 22 subjects (that is, two random subjects were dropped out from each sampling); the mean Spearman correlation coefficient divided by the standard deviation across the samplings (x axis left panel) was used to rank the genes. Top genes are shown together with the predictive performance of each gene evaluated at day 0 (AUC; right panel). The red dashed line in the right panel corresponds to AUC=0.50 (prediction performance as expected by chance); the top 10 genes were selected in TGSig (the black dashed line; see Supplementary Table 1 for full list of ranked genes) based on (c). c, Performance (AUC; y axis) of the gene signature by baseline time point (different lines) and number of top genes included in computing the signature score (x axis). The vertical dashed line corresponds to a gene signature (TGSig) containing the top 10 genes achieving the best AUC across all three time points. d, Schema for gene signature score calculation. Gene expression data is standardized through calculation of Z-scores for each gene (that is, each gene would have mean 0 and standard deviation 1). The Z-scores for each gene in the signature are then averaged to generate the gene signature score. e, Distribution of predictive performance (AUC; x axis) of 500 random top-10 gene signatures generated from subject-label shuffled gene expression data using the robust correlation metric approach described above. The dashed red line indicates the observed AUC of TGSig at each of the baseline time points. One-tailed empirical p-values are shown. f, Relative rank (rank position divided by the total number of genes) of the top 20 genes from (b) and Supplementary Table 1 at different TSM thresholds. The black line shows TSM cutoff=0.75, the value used in selecting the top 10 genes for inclusion in TGSig (boxed). g, Change in AUC (x axis) of TGSig score following removal of the indicated gene in the signature (y axis) at each of the three baseline time points in the NIH influenza study.

Extended Data Fig. 3 Comparison among response classes by including middle responders and evaluating TGSig in the influenza datasets from Emory18.

a, similar to Fig. 1e, f but including middle responders (for day 0 n=11/19/13 for low/middle/high responders, respectively; for day -7 n = 10/22/14 and for day 70 n = 11/21/12); p values from the Jonckheere trend test (with an a priori alternative hypothesis that the high responders >= middle responders >= low responders.) For all box plots the center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. b, Similar to (a) but related to Fig. 2a; note that Yale 2012 does not have middle responders based on data retrieved from ImmuneSpace (n = 9/10/10 (Stanford 2008), 7/3/6 (Yale 2010), 7/0/8 (Yale 2012)). c, Similar to Fig. 2a but testing TGSig using influenza datasets from Emory University over four years (n = 14/8 low/high responders for year 2008, 8/8 for 2009, 11/11 for 2010, 12/10 for 2011). Box plots (top) showing the TGSig score (y axis) in low and high responders (x axis) as defined by adjMFC in the indicated season. P values shown on the box plots were obtained from the Wilcoxon one-tailed test. (bottom) Corresponding receiver operator curves (ROC) for vaccine response and the AUC (area under the curve) and corresponding permutation-based one-tailed p values are shown. d, Similar to (a) but for yellow fever and related to Fig. 2b.

Extended Data Fig. 4 Further evaluation of TGSig in yellow fever and influenza datasets.

a, (top) Box plots (top) showing the TGSig score applied to pre-vaccination PBMC expression data (y axis) between low and high responders (x axis) to yellow fever vaccine in trial #2 (x axis; 4 high vs. 3 low responders). (bottom) Corresponding receiver operator curves (ROC) for vaccine response and the AUC (area under the curve) and corresponding one-tailed permutation-based p value are shown. This vaccination cohort included 10 subjects (see Fig. 2b for results on a first, larger trial with 15 subjects). Boxplots’ center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. P-value is from Wilcoxon one-tailed test. b, Forest plot showing the meta-effect sizes (Hedge’s g reflecting correlation strength with adjMFC) of the TGSig genes from a meta-analysis of four influenza datasets (Stanford 2008, NIH 2009, Yale 2011, Yale 2012; genes in gray were not present in all datasets); the bars represent the 95% CI. c, Similar to (b) but for yellow fever vaccination (computed from trial #1).

Extended Data Fig. 5 SLE patient phenotypes based on DA-associated transcriptional signatures and assessing the association between TGSig (at low DA) and DaCP after the removal of patients with lower DA-associated plasmablast signature scores.

a, Recreation of Figure 6a from ref. 13 but as a robustness check here we used a different method: for each patient we ranked genes based on the difference between their average expression at high/middle and low DA time points and then performed GSEA analysis using the blood transcriptomic modules (columns) to fingerprint the patient. The colors on the heatmap denote the statistical significance (-log10 of BH-adjusted p-values from CERNO39) of the gene set enrichment test. Here we kept the order of patients (rows) and modules (columns) the same as in the original heatmap, although some patients from the original heatmap were removed due to the absence of low DA time points. The original patient groups were used to annotate the patients (rows). The overall fingerprinting pattern is visually highly consistent with the original heatmap. This heatmap/matrix was then used to construct Fig. 2c by averaging individual patient values within each patient and phenotype group combination. b, DaCP versus average plasmablast score difference between high (or middle if no high was available) and low DA time points in SLE patients from patient groups 2, 3 (in blue, n = 19) or groups 2, 3, and 4 (group 4 in grey, n = 12) from ref. 13 (see patient groups in Fig. 2c). Pearson correlation coefficients and two-tailed P values are shown. Spearman rho for groups 2 and 3 is 0.87 (two-tailed p = 2.2 × 10−16); and for groups 2, 3 and 4 is 0.85 (two-tailed p = 4.9 × 10−7). c, Similar to Fig. 2e, but here patients are from patient groups 2 and 3 only (n = 22) and are shaded based on their DaCP. Note that there are more patients here for analysis from groups 2 and 3 than those shown in (a) and (b) because there we required that every patient has at least one low DA and one high/mid DA sample, while here (and in Fig. 2) the DaCP was estimated using all patients with at least one sample including those without high/mid DA time points (see Methods). Pearson correlation coefficient and two-tailed p values are shown. Spearman correlation coefficient is 0.47 (p = 0.029 two-tailed). d, Evaluating the correlation between DaCP and mean TGSig score at low DA time points (as in (c)) by removing patients with DaCP below the indicated cutoff (y axis). The first panel shows the number of patients in the evaluation given the threshold; the second and third panels show the corresponding Pearson r and two-tailed p value (shown as -log10(p)).

Extended Data Fig. 6 Evaluating the predictive capacity and information overlap among the TGSig, SLE-Sig, and IFN-I-DCact signatures.

a–d, The predictive profile of SLE-Sig (Fig. 3e) (a), TGSig (b), IFN-I-DCact (Fig. 3f) (c), and the non-leading edge genes from the brown module (Fig. 3e) (d) used as the sole predictor in logistic regression models of high versus low influenza vaccination responder status. Influenza vaccination data pooled from four datasets (Stanford 2008, NIH 2009, Yale 2011, Yale 2012) were used (n = 71 high and low responders). Note that for the brown module (Fig. 3a–d), most of the predictive information come from the leading-edge genes since the signature score of genes outside of the leading edge is not predictive (shown in (d)). e, f when both TGSig and SLE-Sig were used as predictors in the logistic regression (e) or when both TGSig and IFN-I-DCact score were used as predictors (f). In these graphs, the predictor scores are shown on the x axis and the probability that a high responder falls within the predictor score bin is shown on the y axis. The error bars correspond to 95% CIs. The two-tailed p value indicates the probability that the coefficient (“effect”) of the term (for example, TGSig score) in the logistic regression is 0.

Extended Data Fig. 7 RNA-seq analysis of CD19+CD20+CD38++ B cells sorted from healthy individuals.

a, Sorting strategy and approach: CD19+CD20+and CD20+CD38++ B cell populations were isolated by FACS from peripheral blood samples of six healthy donors. RNA-seq libraries were prepared for each isolated sample using the Nugen Ovation SoLo low-input RNA-seq library preparation kit. b, Differential gene expression between the CD20+CD38++ B cells and parental CD19+CD20+B cells from six paired samples was compared using DESeq2. Plot shows the log2 fold change versus the log2 of the mean normalized counts across all samples for each gene. TGSig genes are shown in red; genes from the differentially expressed gene set (BH-adjusted two-tailed p-value <1%, log of mean normalized counts >1; total genes in set: 105) that fall into the top enriched Gene Ontology Biological Processes category “Cell Activation” (enrichment analysis done using ToppGene70) are shown in cyan (21 genes). c, Enrichment of the 87 SLE-Sig genes in genes ranked by differential expression between CD19+CD20+CD38++ versus CD19+CD20+cells. The p value shown was computed from the GSEA test. d, Similar to (c) but instead of the SLE-Sig genes here the top k (k = 10 (TGSig), 30, 50) genes correlated with the frequency of CD20+CD38++ B cells is assessed (only 713 temporally stable genes with TSM≥ 0.75 were included in the analysis); also see Extended Data Fig. 2. e, Similar to (d) but using 7731 genes with TSM ≥ 0.5. A lower/more relaxed TSM cutoff was used to evaluate whether by starting with more genes (therefore potentially more statistical power for enrichment analysis) an enrichment signal can be detected.

Extended Data Fig. 8 Supporting data for CITE-seq single cell analysis to dissect the cellular origin of baseline signatures.

a, Single cell scatterplots of key markers in the CD4+T cell clusters. C0.0.0 (Naïve), C1.0.0 (Central/Transitional Memory), and C1.1.0 (TEMRA/Effector memory) clusters show different distributions of CD62L, CD45RA, CD27, and CD28. b, Distributions of key markers in the CD8+Memory T cell clusters. CD45RA vs. CD62L expression are shown in the top panels, ridge plots of CD45RO and CD28 are shown in lower panels. Similar to CD4+cells, these CD8+clusters show differences in CD62L and CD45RA distributions; C4.0.1 (TEMRA/Effector Memory) are mainly CD62L negative and CD28 negative, with CD45RA+(TEMRA) and CD45RO+(Effector Memory) subsets within this cluster. C4.0.0 and C4.0.3 show highly similar protein expression, with C4.0.3 being defined by high CD103 expression (upper right panel). c, Unconventional T cells (C7 clusters) show variable CD161, CD8, and CD56 expression. C7.0.0 and C7.0.1 are both CD3+/CD161+, with C7.0.1 being CD8 positive while C7.0.0 is CD8 negative. C4.0.2 (NKT-like) are also CD3 positive, but express CD57/56 and are CD8/CD161 negative or low. The C4.0.2 cluster also showed distinctly low CD27 and high in CD45RA compared to the C7 clusters, making it more similar to the C4.0.1 TEMRA/Effector memory CD8+subset, except expressing CD56/CD57; this could also be consistent with a Terminal Effector phenotype. d, Hand gating strategy for CD20+CD38++ B cells using CITE-seq data (number of cells in the gate shown in red). e, Boxplot comparing the hand gated frequency of CD20+CD38 ++ B cells between 10 high and 10 low responders; p value from Wilcoxon one-tailed test. Boxplots’ center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. f, Similar to Fig. 5b (10 high versus 10 low responders) but for an independently obtained Type I IFN signature gene set (Supplementary Table 6: IFN gene set; see Methods). One or two asterisks denote significance with p < 0.05 or p < 0.01, respectively (Wilcoxon one-tailed test because we are interested in assessing whether the high responders are higher than the low responders; see also Supplementary Table 7). g, Results of the “drop out” analysis using CITE-seq data. The goal was to assess which cell clusters (or combination of clusters) that were individually significant delineators of high versus low responders were essential for prediction using the baseline signatures in bulk (i.e., simulating transcriptional data from PBMCs). The “pseudo bulk” results (average across all single cells for every subject in 10 low and 10 high responders) of the three signatures tested are shown on the first row. For subsequent rows cells from the indicated cell cluster(s) were dropped before repeating the pseudo-bulk analysis as in row 1. P values obtained from one-tailed Wilcoxon test: *: p < 0.05; **: p < 0.01.

Extended Data Fig. 9 Evaluating cytomegalovirus (CMV) correlates and pDC surface expression phenotypes strongly influenced by genetics from ref. 35 in high versus low responders.

Since CMV status is not available for the cohorts we evaluated, we evaluated whether CMV correlates are significantly different between high and low responders in the NIH influenza vaccination cohort. a, Boxplots comparing the frequency of CD4+ TEMRA cells between 10 low and 10 high responders using CITE-seq (left panel) and between 9 low and 8 high responders using flow cytometry (center panel) data. Wilcoxon two-tailed p values are shown. The third panel is a scatter plot of CITE-seq versus the flow cytometry cell frequencies (n = 17). Pearson correlation and two-tailed p values are shown. b, Same as (a) but for CD8+ TEMRA cells. c, Boxplots comparing the relative surface protein expression of CD86 and HLA-DR in pDCs (cluster C9) between 10 low and 10 high responders using CITE-seq data. Wilcoxon two-tailed p values are shown. For all box plots the center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range.

Extended Data Fig. 10 Relationship between sex and baseline signatures.

a, Box plots comparing the TGSig and SLE-Sig scores in females versus males; here only subjects with CITE-seq data are included to indicate that sex was not a driver of the differences between 10 high and 10 low responders emerged from CITE-seq data analysis (see Fig. 4). Wilcoxon two-tailed p values comparing 11 females and 9 males are shown. For all box plots the center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. b, Same as (a) but including all high and low responders from original NIH study12. (day 0: 12 females and 12 males; day -7: 13 females and 11 males; day 70: 11 females and 12 males) c, same as (b) but including middle responders (i.e., all subjects in the study: day 0: 27 females and 16 males; day -7: 30 females and 16 males; day 70: 27 females and 17 males). d, Box plots comparing TGSig scores among low, middle, and high responders in males only (all subjects in the original NIH study12 used: day 0: 6/4/6 for low/middle/high responders, respectively; day -7: 5/5/6; day 70: 6/5/6). e, same as (d) but in females only (day 0: 5/15/17 for low/middle/high responders, respectively; day -7: 5/17/8; day 70: 5/16/6). All p values shown for two-group comparison were from the Wilcoxon two-tailed test; one-tailed p values shown for three-group comparison were from the Jonckheere trend test (with an a priori alternative hypothesis that the high responders >= middle responders >=low responders).

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–10.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kotliarov, Y., Sparks, R., Martins, A.J. et al. Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus. Nat Med 26, 618–629 (2020).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing