Vaccines are among the most cost-effective public health interventions for preventing infection-induced morbidity and mortality, yet much remains to be learned regarding the mechanisms by which vaccines protect. Systems immunology combines traditional immunology with modern ‘omic profiling techniques and computational modeling to promote rapid and transformative advances in vaccinology and vaccine discovery. The NIH/NIAID Human Immunology Project Consortium (HIPC) has leveraged systems immunology approaches to identify molecular signatures associated with the immunogenicity of many vaccines. However, comparative analyses have been limited by the distributed nature of some data, potential batch effects across studies, and the absence of multiple relevant studies from non-HIPC groups in ImmPort. To support comparative analyses across different vaccines, we have created the Immune Signatures Data Resource, a compendium of standardized systems vaccinology datasets. This data resource is available through ImmuneSpace, along with code to reproduce the processing and batch normalization starting from the underlying study data in ImmPort and the Gene Expression Omnibus (GEO). The current release comprises 1405 participants from 53 cohorts profiling the response to 24 different vaccines. This novel systems vaccinology data release represents a valuable resource for comparative and meta-analyses that will accelerate our understanding of mechanisms underlying vaccine responses.
Transcriptomics • Hemagglutination Inhibition Assay • IgG IgM IgA Total Measurement • Virus-neutralizing Antibody • ELISA
Microarray • RNA sequencing • Hemagglutination Inhibition Assay • ELISA • Microneutralization Assay • serum neutralization of viral infectivity assay
Sample Characteristic - Organism
Background & Summary
Vaccines, one of humanity’s greatest public health achievements, save millions of lives every year by preventing infectious diseases1,2. Despite their widespread use and efficacy, much remains to be learned regarding their molecular mechanisms of action. This is true both for vaccines against pandemic infections such as influenza3, and SARS-coronavirus-24, as well as for infections for which there are currently no authorized or approved vaccines such as HIV5,6,7. Elucidating the commonalities and differences in the immune responses induced by different vaccines and their association with protective antibody responses will provide deeper insight and a framework for the evidence-based design of better vaccines or vaccination strategies. Recent technologies have provided tools to probe the immune response to vaccination and integrate hierarchical levels of the biological system8. Alluded to as systems vaccinology9, this new application of systems biology tools provides new insights into molecular mechanisms of vaccine-induced immunogenicity and protection10,11,12,13.
The National Institute of Allergy and Infectious Diseases (NIAID) established a multi-institutional consortium, Human Immunology Project Consortium (HIPC)14,15, to characterize the immune system in diverse populations in response to a stimulus, such as vaccination, using high-dimensional ‘omic platforms and modern computational tools14. Since the inception of the consortium in 2010, members of HIPC have published >500 articles, including many that describe molecular signatures associated with vaccine-induced protection. These studies include molecular signatures that predict the immunogenicity of vaccination against yellow fever16,17,18,19, seasonal influenza in healthy young adults, elderly20,21,22,23,24, and children25, shingles26,27, dengue28,29, malaria30,31, and meta-analyses of common signatures across different vaccines32,33. These molecular signatures resulted from large-scale data analysis using high-throughput systems biology approaches coupled with detailed clinical phenotyping in well-characterized human cohorts.
Predicting immunogenicity from ‘omic signatures remains challenging, prompting methodological innovation to advance the field towards delivering on the promises of precision vaccination34,35,36. The factors that contribute to robust vaccination responses are highly complex and span multiple biological scales. The vast collection of high-dimensional profiling datasets poses significant challenges for comparative analysis of these studies, including biological variability as well as data challenges such as volume, technical noise, and diverse sample processing pipelines. Data integration of cellular and molecular signatures to predict vaccine responses requires harmonization and normalization of data from multiple sources37. The generation of big data poses simultaneous challenges and opportunities with the potential of contributing to precision medicine. The biological interpretation of the resulting molecular features correlated with robust responses is another key factor. Understanding how effective vaccines stimulate protective immune responses, and how these mechanisms may differ between vaccine types and targeted pathogens remains a substantial challenge for the field. Moreover, the systems vaccinology field has been limited by a lack of a formal framework to standardize immune signatures gathered from diverse studies, creating a bottleneck for comparative analysis. To address these challenges, and in support of advances in systems vaccinology by the HIPC project and the broader scientific community, we present the creation of the Immune Signatures Data Resource, a compendium of systems vaccinology studies that enables standardized comparative analysis to identify molecular signatures that correlate with the outcomes of vaccinations.
The current release of the Immune Signatures Data Resource consists of 4795 transcriptomic samples from 1405 participants curated from 30 ImmPort studies (16 from HIPC-related studies, 14 non-HIPC studies) (Fig. 2, Table 1). The transcriptomic profiling dataset is derived from 53 cohorts of 820 young adults (18–49 years old) and 585 (≥50 years old) older adult samples. The data resource covers 24 vaccines targeting 11 pathogens and 6 vaccine types (Figs. 1b, 4a, Table 2), thus creating a critical mass of data that will serve as a valuable resource for the broader scientific community. Additionally, data assembly and integration of these data set enables derivation of comparable signatures for each study for comparative analysis of the underlying data.
Database background information and structure
Compatibility with immport and immunespace, the central databases of the human immunology project consortium
Given the exponential growth of the number of datasets of multiple modalities, an urgent need emerged for data sharing across the broader scientific community. The HIPC implements the NIH Data Sharing policy to promote the principles of Findability, Accessibility, Interoperability, and Reusability (FAIR) via ImmPort, created under the National Institute of Allergy and Infectious Diseases Division of Allergy, Immunology, and Transplantation (NIAID-DAIT). ImmPort (ImmPort.org) is an open repository of participant-level large-scale human immunology data designed to aid scientists with data standards and guidelines for efficient secondary analyses38,39. ImmPort facilitates data sharing of immunology studies creating a centralized knowledge base and resources, and serves as a central data repository for HIPC. ImmuneSpace14,33 extends ImmPort, providing access to additional data (e.g., standardized gene expression matrices) and web-based R tools for data accession, analysis, and reporting. Studies in the Immune Signatures Data Resource are archived through the Shared Data Portal on ImmPort and ImmuneSpace repositories and may be updated over time. To provide a consistent data source for reproducible results, we also archived a static copy of the data as a “virtual study” in ImmuneSpace (Figs. 1a and 2).
Identification of vaccine study cohorts with transcriptomic profiles
Through a literature search conducted from July 2017 to January 2020 with terms including “Vaccine [AND] signatures”, “Vaccine [AND gene expression”, “Vaccine [AND] immune response [AND] gene expression”, we identified target publications containing transcriptomics profiling datasets and vaccination responses. We found 16 HIPC-funded vaccinology studies in ImmPort with transcriptomics datasets generated with matching immune response outcomes and surveyed HIPC centers of their publications. We excluded non-human study cohorts, cohorts with B cell and T cell transcriptomics since most studies are PBMC or whole blood-derived, studies other than with intramuscular mode of vaccine route, studies with subjects beyond our target age range (<18), and those studies that lack vaccine stimulation. Notably, we have supplemented the HIPC data previously available in ImmPort by curating and submitting 14 additional human vaccination studies to ImmPort. For studies that were not in ImmPort/ImmuneSpace, we located the underlying data by surveying public transcriptome databases (e.g., Gene Expression Omnibus (GEO)) or reaching out to study authors to request data access, allowing us to submit to ImmPort on their behalf. These datasets were then made available via ImmuneSpace to be processed for standardization, preprocessing checks, and normalization. The standard analytical pipeline enables reproducibility and comparability of future studies to be correlated with publicly available immune response measurement. This process created the virtual study for the HIPC named the Immune Signatures Data Resource (Figs. 1a, 2).
Gene expression data processing pipeline
Data were read directly from ImmuneSpace using ImmuneSpaceR functions and subsequently preprocessed, quality controlled, and integrated using the following pipeline:
Quality control of microarray experiments
The ArrayQualityMetrics R package40 was used for quality control and assurance of all microarray experiments (Fig. 3a). Outlier detection was based on the following statistics: i) Mean absolute difference of M-values (log-ratios) of each pair of arrays, ii) the Kolmogorov-Smirnov statistic Ka between each array’s signal intensity distribution and the distribution of the pooled data and, iii) the Hoeffding’s statistic Da on the joint distribution of A (average) and M values for each array. Using pre-specified criteria within an established public microarray data reuse pipeline40, we flagged for removal arrays that failed all three quality control statistics.
Raw probe intensity data for Affymetrix studies were background-corrected and summarized using the RMA algorithm41 while the function read.ilmn (limma R package) was used to read and background correct Illumina raw probe intensities. To integrate RNA-seq and microarray data, raw counts for RNA-seq data were transformed using the variance stabilizing transformation (VST). VST yields expression values that are normalized across samples and by library size and approximately homoskedastic. After a proper log-2 transformation they can be analyzed as microarray data, using linear models in the limma framework. Expression data within each study were quantile normalized and log-transformed separately for each cohort/sample type.
We annotated the manufacturing IDs (probes from microarray/Illumina) to their corresponding gene alias. Gene aliases were mapped to the recent gene symbols from the HUGO Gene Nomenclature Committee42 [accessed Dec 23, 2020]. For the rare case where a gene alias mapped to more than one gene symbol, the mapping was resolved by the following: i) If a gene alias mapped to itself as a symbol, as well as other symbols, then it was mapped to itself; ii) if the gene alias mapped to multiple symbols that did not include itself, then the gene alias was dropped from the study. As a result, the raw gene expression matrix was reduced to 10086 HUGO gene aliases with known unique mapping.
Gene-based expression profiles
Expression data were summarized at the probe level (for microarray data) and gene-alias level (RNA-seq) to the canonical Gene-Symbol level. The probes/gene-aliases were summarized by selecting the probe or gene-alias with the highest average expression (mean of probes across all samples, take the highest mean) across all samples within the matrix (cohort and sample type).
One of the main assumptions in expression analysis is that differences in gene expression across conditions occur in a relatively small number of processes. As such, the distribution across conditions should be similar, and departures of these assumptions are corrected, for example, using quantile normalization. This procedure usually creates a target distribution using all samples available, but we observed dissimilar distributions in our collection stemming from various platforms used. Such differences lead to extensive distributions and introduce artifacts in the data (Fig. 3b,c). The target distribution was obtained from samples using Affymetrix platforms, resulting in a well-defined distribution, and each sample in our collection was quantile normalized to this target distribution. Before cross-study normalization, there were 35,725 representative gene symbols present. There were 25,639 genes removed after normalization, as these genes were not present in all the studies. This yielded a final expression matrix of 4795 samples from 1405 participants representing 10,086 genes (Fig. 2).
Determining and adjusting for technical confounders
We studied the primary sources of variation in the data, including the study effect (which also encompasses the impact of different expression platforms (RNA-seq, Affymetrix arrays, Illumina arrays, etc.), sample types (Whole blood, PBMC), as well as demographics. We conducted Principal Component Analysis (PCA) to visualize such associations in a bidimensional space of principal components (PCs) and applied Principal Variance Component Analysis (PVCA)43 to quantify the amount of variability attributed to different experimental conditions. This approach models the multivariate distribution of the PCs computed for the PCA as a function of experimental factors and estimates the total variance explained by each factor via mixed-effect models. Since many studies included only one vaccine, temporal variations due to vaccine response were confounded with the study effect. The assessment of the primary technical sources of variation was carried out using only the pre-vaccination data, not affected by the targeted pathogen and vaccine type used in the different studies. Of note, all studies enrolled healthy volunteers, and the first biosample was obtained pre-vaccination. The targeted pathogen and vaccine type should not affect these baseline data.
Platform, study, and sample types were identified as significant sources of variation in the gene expression matrix. The effect of those three variables was estimated by modeling gene expression at baseline (at which no vaccine or timepoint effect exists) with a linear model using the limma framework, including feature set vendor (Platform/Affy), study (batch factors), and sample type, Y-chromosome genes presence, as covariates. Study and cell-type effects were estimated using a linear model with age, Y-chromosome genes presence (biological sex), study, sample type (Whole Blood/PBMC), study, and platform as additive effects. From here, the study, platform, and cell-type effects were eliminated from the entirety of the expression matrix. There were three studies (SDY1276, SDY1264, SDY180) that contained multiple cohorts and were treated as separate studies.
Biological sex imputation
Imputation of biological sex, as defined by the presence of a Y-chromosome, was carried out based on the gene expression profiles of 13 Y-chromosome genes. Within each study, a multidimensional scaling was first applied to the Y-chromosome gene expression profiles. K-means clustering was then used to cluster samples into two groups. Participants in the cluster with higher mean expression values were considered male (i.e., the Y-chromosome was present) while those in the cluster with lower expression were considered female (i.e., the Y-chromosome was absent). The consistency of the Y-chromosome presence assignment across time points was verified (Fig. 3d). In the (few) cases where imputation was not in agreement across all time points, the reported sex was used and if no sex was reported, imputation followed a majority rule principle.
Age imputation for studies without reported ages (SDY1260, SDY1264, SDY1293, SDY1294, SDY1364, SDY1370, SDY1373, SDY984) employed the RAPToR R v1.1.5 package44. The RAPToR algorithm takes in a reference set of gene expression time series with reported ages and generates a near-continuous, high-temporal resolution from the interpolated reference dataset. Transcriptomic profiles of participants without reported ages were compared to the reference dataset via a correlation profile, providing age estimates for the sample. Finally, random subsets of genes from the subject’s transcriptomic profile were bootstrapped to ascertain a confidence interval for the imputed age. We generated the reference dataset using the transcriptomic profiles of 21 studies in our resource for which age was reported. The studies were split into younger (age <50) and older (age ≥50) cohorts, thus two different models were generated, and only baseline transcriptomic profiles were used in the reference dataset. As RAPToR also enables phenotypic data to be incorporated into the interpolation model, each possible combination of phenotypic features was tested. These phenotypic features included the top variables found during our PVCA tests as well as demographic information such as reported age, cohort and matrix type, Y chromosome imputation, study accession, feature set vendor and platform names, and cell types. For each combination, RAPToR predicted the age of participants in the 21 studies with known age, and the goodness of fit was evaluated by the coefficient of determination (R2) and confirmed via RMSE. The best model for the younger and older cohorts was then used to impute ages for the 7 studies without reported age (Fig. 3e,f)
Immune response datasets processing pipeline
To identify the molecular signatures that correlate with vaccine immunogenicity, we included immune response readouts in the creation of this data resource. For studies that were missing vaccine response endpoints in their public data deposition, we contacted study authors and requested available antibody response measures to vaccine antigens. Once shared, these data were submitted to ImmPort and linked to the relevant studies. These readouts include neutralizing antibody titers (Nab), hemagglutination inhibition assay (HAI) results for influenza studies, and Immunoglobulin IgG ELISA assay results. In participants for whom the humoral immune response was measured with multiple assays, the preference was given to HAI for influenza or Nab for non-influenza studies, then IgG ELISA datasets. The antibody measures were normalized within each study by estimating the fold-change differences between the post-vaccination time-point (generally between day 28 or day 30) compared to the baseline measurement. For influenza studies where the vaccine included multiple strains, the fold changes between the post-vaccination versus baseline were calculated for each strain, and the maximum fold change (MFC) over the strains was selected33. Due to the variability in baseline antibody (Ab) levels and immune memory such as influenza vaccines, we also estimated the maximum residual after baseline adjustment (maxRBA) method by calculating the maximum residual across all vaccine strains to adjust for variable baseline Ab levels using the R package titer20. A total of 30 studies with 1405 participants and 4795 samples have both transcriptomics and immune response readout data available (Fig. 2). This dataset enables researchers to carry out comparative analyses using immunogenicity data as well as prediction of the quality of response across multiple vaccines.
The Immune Signatures Data Resource is available online for download by the research community from this website45: https://doi.org/10.6084/m9.figshare.17096978. The data is hosted on ImmuneSpace and can be accessed in full detail via the R package ImmuneSpaceR (https://rglab.github.io/ImmuneSpaceR/). The resource is available for use by the scientific community and can be downloaded from a research data repository IS2 https://www.ImmuneSpace.org/is2.url. A summary of datasets17,18,20,21,22,24,26,32,46,47,48,49,50,51,52,53,54,55,56,57,58, with their corresponding study ID, accession numbers and DOI, is provided in Table 3.
Quality control and assurance
For global quality control across all public microarray data, we used a well-established pipeline available through the ArrayQualitymetrics R package40. Using pre-specified criteria established in the existing public microarray data reuse pipeline59, arrays that failed 3 out of 3 calculated quality control statistics were flagged for removal (see Methods). Consistent with standard practice to perform such quality control analysis prior to downstream analysis and dataset submission to the Gene Expression Omnibus, none of the samples were outliers by all three statistics (Fig. 3a). As expected for data from published peer-reviewed studies, all the identified studies passed the quality assurance method using the Arrayqualitymetrics method.
Y-chromosomal presence and age imputation
A few studies were missing information for sex and for age. To achieve data completeness, we included the biological sex imputation based on the imputed presence of the Y-chromosome using gene expression, as well as imputation of age when the variable was missing or defined by a broad range of values. Age imputation employed the RAPToR tool using 21 studies with reported age to define the best predictive model for the younger (age <50 years) and older (age ≥ 50 years) cohorts separately. The model with the lowest root mean square error (RMSE) from the young cohort was generated by taking into account the model (X ~ age_reported + matrix) with a coefficient of determination of R2 = 0.367 (Fig. 3e), while the old cohort yielded a prediction with R2 of 0.536 for their highest performing model (Fig. 3f).
Definition of vaccination studies transcriptomic cohort
Data preprocessing in ImmuneSpace yielded a total of 30 studies and 59 cohorts, with 1482 participants and 5413 samples. After the data was preprocessed and quality control measures were performed, we further assessed the identified cohorts as defined in the flow diagram (Fig. 2). This curation included: i) removing participants that were not relevant to the objective (n = 34); ii) removing samples due to inconsistencies with time design determination (n = 178); iii) removing participants with no baseline expression data (n = 42). Some studies, such as SDY1368 and SDY67, were dropped from the normalized data sets as they did not include subjects within our target age range (18–50 years). In summary, we report that the final Immune Signatures Data Resource contains 53 cohorts from 30 studies with 1405 participants and 4795 samples.
Assessment and adjustment of the batch effects
We evaluated the main sources of variation on the gene expression matrix to identify and adjust technical confounders (RNA-seq, Affymetrix arrays, Illumina arrays, etc.), study, and specimen types (e.g., whole blood vs. PBMCs) using the baseline samples. Since all studies enrolled healthy volunteers, and the first sample was taken pre-vaccination, pathogen and vaccine type would not affect the baseline data. Figure 3b clearly demonstrates robust clustering of samples by study, which are also grouped by platform type. The study effect and type of platform used accounted for the vast majority (95%) of variation, followed by specimen types (3.6%). It is thus essential that the data are corrected for these major effects prior to any analytical usage [see Materials and Methods for further details]. The study, platform type, and specimen type-specific effects were estimated using a linear model that also included age and Y-chromosome presence as additive effects using only baseline expression. Once the study, platform, and specimen-type effects were estimated, they were eliminated from the entirety of the expression matrix. Figure 3b shows that those effects can successfully be adjusted from the data, thus leading to a matrix of expression that is free of most technical biases induced by the laboratory and cell-type effects.
Immune signatures transcriptomics and immune response datasets
We report the total number of assay samples collected from the transcriptomic and immune response datasets tallied by targeted pathogen and vaccine type, across multiple systems vaccinology datasets (Fig. 4a). We captured about ~3000 HAI antibody titer results from influenza studies that were measured by the standard HAI assay pre- and at multiple time points post-vaccination, depending on the study. Mean titers were calculated for the reported strains of the virus and were based on the highest dilution reported at day 28–30 post-vaccination. In addition, neutralizing antibody (NAB) titers and IgG ELISA results specific to each pathogen were determined by each study and are summarized (Fig. 4a). The overall transcriptomics dataset comprises multiple time points from 7 days pre-vaccination up to day 180 days post-vaccination (Fig. 4b). While most of the datasets focus on the young adult population (ages 18–50 years old), the data resource also includes studies that profile older adults following hepatitis B, influenza, and varicella vaccination (Fig. 4c) that may be useful for analysis. The Euler diagram describes the dataset overlap of participants with transcriptomics datasets and corresponding to one or more immune response datasets (Fig. 4d).
Heterogeneity of the immune response to vaccination across targeted pathogens and vaccine types was reflected in variation in the longitudinal trajectories of HAI and NAB titer measurements (Fig. 5a,b). HAI and NAB titers generally increased by 14–28 days after vaccination but attenuated at different times for each vaccine (Fig. 5a,b). Change in NAB titers after vaccination were significantly different across the 5 unique combinations of targeted pathogen and vaccine types where these measurements were reported (ANOVA p < 10−10), with significant differences across all 5 groups except between meningococcus and yellow fever vaccines (Fig. 5c). Some influenza vaccination studies reported both HAI and NAB measures of immunogenicity, and there was a significant positive correlation between the vaccination-induced changes in these titers across participants (Spearman’s rho = 0.45, p < 10−10) (Fig. 5d).
The expression data and accompanying meta-data have been made available with different formats and options to ease usage. Data are available as standard expression sets (eSet) objects, the R/Bioconductor structure unifying expression values, metadata, and gene annotation Both normalized data and batch-adjusted data are available (Table 4). Users interested in a single study or those planning to work exclusively within participants’ changes may opt for the normalized data without batch adjustment. For comparison of time points across studies or developing algorithms that use expression data, batch corrected matrices should be employed. Imputed age values for participants with no reported age were included to facilitate the use of age as a covariate in future analysis. Such analysis can be carried out with the complete data set and can be followed up by a sensitivity analysis using the small cohort with age-reported data. For the use of expression sets with the corresponding immune response per participant, these are available in eSets noted with a response. The selected immune response outcome per study is also summarized in Table 3.
The source codes for the Immune Signatures Data Resource and all data are available in ImmuneSpace (https://www.immunespace.org/is2.url) and in Zenodo60 (https://doi.org/10.5281/zenodo.5706261) and FigShare45: (https://doi.org/10.6084/m9.figshare.17096978). Pre-processing code and supplementary data in full detail can be found in the ImmuneSignatures2 R package hosted on Github (https://github.com/RGLab/ImmuneSignatures2).
Piot, P. et al. Immunization: vital progress, unfinished agenda. Nature 575, 119–129, https://doi.org/10.1038/s41586-019-1656-7 (2019).
Pulendran, B. Systems vaccinology: probing humanity’s diverse immune systems with vaccines. Proc Natl Acad Sci USA 111, 12300–12306, https://doi.org/10.1073/pnas.1400476111 (2014).
Fineberg, H. V. Pandemic preparedness and response–lessons from the H1N1 influenza of 2009. N Engl J Med 370, 1335–1342, https://doi.org/10.1056/NEJMra1208802 (2014).
Fauci, A. S., Lane, H. C. & Redfield, R. R. Covid-19 - Navigating the Uncharted. N Engl J Med https://doi.org/10.1056/NEJMe2002387 (2020).
Fauci, A. S. An HIV Vaccine Is Essential for Ending the HIV/AIDS Pandemic. JAMA 318, 1535–1536, https://doi.org/10.1001/jama.2017.13505 (2017).
Fauci, A. S., Folkers, G. K. & Marston, H. D. Ending the global HIV/AIDS pandemic: the critical role of an HIV vaccine. Clin Infect Dis 59(Suppl 2), S80–84, https://doi.org/10.1093/cid/ciu420 (2014).
Fauci, A. S. & Marston, H. D. Ending the HIV-AIDS Pandemic–Follow the Science. N Engl J Med 373, 2197–2199, https://doi.org/10.1056/NEJMp1502020 (2015).
Diercks, A. & Aderem, A. Systems approaches to dissecting immunity. Curr Top Microbiol Immunol 363, 1–19, https://doi.org/10.1007/82_2012_246 (2013).
Pulendran, B., Li, S. & Nakaya, H. I. Systems Vaccinology. Immunity 33, 516–529, https://doi.org/10.1016/j.immuni.2010.10.006 (2010).
Tsang, J. S. et al. Improving Vaccine-Induced Immunity: Can Baseline Predict Outcome? Trends Immunol 41, 457–465, https://doi.org/10.1016/j.it.2020.04.001 (2020).
Nakaya, H. I., Li, S. & Pulendran, B. Systems vaccinology: learning to compute the behavior of vaccine induced immunity. Wiley Interdiscip Rev Syst Biol Med 4, 193–205, https://doi.org/10.1002/wsbm.163 (2012).
Nakaya, H. I. & Pulendran, B. Systems vaccinology: its promise and challenge for HIV vaccine development. Curr Opin HIV AIDS 7, 24–31, https://doi.org/10.1097/COH.0b013e32834dc37b (2012).
Zak, D. E. & Aderem, A. Overcoming limitations in the systems vaccinology approach: a pathway for accelerated HIV vaccine development. Curr Opin HIV AIDS 7, 58–63, https://doi.org/10.1097/COH.0b013e32834ddd31 (2012).
Brusic, V., Gottardo, R., Kleinstein, S. H. & Davis, M. M., committee, H. s. Computational resources for high-dimensional immune analysis from the Human Immunology Project Consortium. Nat Biotechnol 32, 146–148, https://doi.org/10.1038/nbt.2777 (2014).
Poland, G. A., Quill, H. & Togias, A. Understanding the human immune system in the 21st century: the Human Immunology Project Consortium. Vaccine 31, 2911–2912, https://doi.org/10.1016/j.vaccine.2013.04.043 (2013).
Muyanja, E. et al. Immune activation alters cellular and humoral responses to yellow fever 17D vaccine. J Clin Invest 124, 3147–3158, https://doi.org/10.1172/JCI75429 (2014).
Gaucher, D. et al. Yellow fever vaccine induces integrated multilineage and polyfunctional immune responses. J Exp Med 205, 3119–3131, https://doi.org/10.1084/jem.20082292 (2008).
Querec, T. D. et al. Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans. Nat Immunol 10, 116–125, https://doi.org/10.1038/ni.1688 (2009).
Querec, T. et al. Yellow fever vaccine YF-17D activates multiple dendritic cell subsets via TLR2, 7, 8, and 9 to stimulate polyvalent immunity. J Exp Med 203, 413–424, https://doi.org/10.1084/jem.20051720 (2006).
Avey, S. et al. Seasonal Variability and Shared Molecular Signatures of Inactivated Influenza Vaccination in Young and Older Adults. J Immunol 204, 1661–1673, https://doi.org/10.4049/jimmunol.1900922 (2020).
Nakaya, H. I. et al. Systems Analysis of Immunity to Influenza Vaccination across Multiple Years and in Diverse Populations Reveals Shared Molecular Signatures. Immunity 43, 1186–1198, https://doi.org/10.1016/j.immuni.2015.11.012 (2015).
Nakaya, H. I. et al. Systems biology of vaccination for seasonal influenza in humans. Nat Immunol 12, 786–795, https://doi.org/10.1038/ni.2067 (2011).
Oh, J. Z. et al. TLR5-Mediated Sensing of Gut Microbiota Is Necessary for Antibody Responses to Seasonal Influenza Vaccination. Immunity 41, 478–492, https://doi.org/10.1016/j.immuni.2014.08.009 (2014).
Thakar, J. et al. Aging-dependent alterations in gene expression and a mitochondrial signature of responsiveness to human influenza vaccination. Aging (Albany NY) 7, 38–52, https://doi.org/10.18632/aging.100720 (2015).
Nakaya, H. I. et al. Systems biology of immunity to MF59-adjuvanted versus nonadjuvanted trivalent seasonal influenza vaccines in early childhood. Proc Natl Acad Sci USA 113, 1853–1858, https://doi.org/10.1073/pnas.1519690113 (2016).
Li, S. et al. Metabolic Phenotypes of Response to Vaccination in Humans. Cell 169, 862–877 e817, https://doi.org/10.1016/j.cell.2017.04.026 (2017).
Sullivan, N. L. et al. Breadth and Functionality of Varicella-Zoster Virus Glycoprotein-Specific Antibodies Identified after Zostavax Vaccination in Humans. J Virol 92, https://doi.org/10.1128/JVI.00269-18 (2018).
Michlmayr, D. et al. Comprehensive Immunoprofiling of Pediatric Zika Reveals Key Role for Monocytes in the Acute Phase and No Effect of Prior Dengue Virus Infection. Cell Rep 31, 107569, https://doi.org/10.1016/j.celrep.2020.107569 (2020).
Katzelnick, L. C. et al. Antibody-dependent enhancement of severe dengue disease in humans. Science 358, 929–932, https://doi.org/10.1126/science.aan6836 (2017).
Kazmin, D. et al. Systems analysis of protective immune responses to RTS,S malaria vaccination in humans. Proc Natl Acad Sci USA 114, 2425–2430, https://doi.org/10.1073/pnas.1621489114 (2017).
Mpina, M. et al. Controlled Human Malaria Infection Leads to Long-Lasting Changes in Innate and Innate-like Lymphocyte Populations. J Immunol 199, 107–118, https://doi.org/10.4049/jimmunol.1601989 (2017).
Li, S. et al. Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nat Immunol 15, 195–204, https://doi.org/10.1038/ni.2789 (2014).
Team, H.-C. S. P. & Consortium, H.-I. Multicohort analysis reveals baseline transcriptional predictors of influenza vaccination responses. Sci Immunol 2, https://doi.org/10.1126/sciimmunol.aal4656 (2017).
Azuaje, F. Computational models for predicting drug responses in cancer research. Brief Bioinform 18, 820–829, https://doi.org/10.1093/bib/bbw065 (2017).
Jia, S., Li, J., Liu, Y. & Zhu, F. Precision immunization: a new trend in human vaccination. Hum Vaccin Immunother 16, 513–522, https://doi.org/10.1080/21645515.2019.1670123 (2020).
Gao, A. et al. Predicting the Immunogenicity of T cell epitopes: From HIV to SARS-CoV-2. bioRxiv, https://doi.org/10.1101/2020.05.14.095885 (2020).
Chaussabel, D. Assessment of immune status using blood transcriptomics and potential implications for global health. Semin Immunol 27, 58–66, https://doi.org/10.1016/j.smim.2015.03.002 (2015).
Bhattacharya, S. et al. ImmPort: disseminating data to the public for the future of immunology. Immunol Res 58, 234–239, https://doi.org/10.1007/s12026-014-8516-1 (2014).
Bhattacharya, S. et al. ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data 5, 180015, https://doi.org/10.1038/sdata.2018.15 (2018).
Kauffmann, A., Gentleman, R. & Huber, W. arrayQualityMetrics–a bioconductor package for quality assessment of microarray data. Bioinformatics 25, 415–416, https://doi.org/10.1093/bioinformatics/btn647 (2009).
Irizarry, R. A. et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264, https://doi.org/10.1093/biostatistics/4.2.249 (2003).
Bruford, E. A. et al. Guidelines for human gene nomenclature. Nat Genet 52, 754–758, https://doi.org/10.1038/s41588-020-0669-3 (2020).
Boedigheimer, M. J. et al. Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories. BMC Genomics 9, 285, https://doi.org/10.1186/1471-2164-9-285 (2008).
Bulteau, R. & Francesconi, M. Real age prediction from the transcriptome with RAPToR. bioRxiv, 2021.2009.2007.459270, https://doi.org/10.1101/2021.09.07.459270 (2021).
Diray-Arce, J. et al. HIPC-II Immune Signatures Data Resource. figshare https://doi.org/10.6084/m9.figshare.17096978.v1 (2021).
Rechtien, A. et al. Systems Vaccinology Identifies an Early Innate Immune Signature as a Correlate of Antibody Responses to the Ebola Vaccine rVSV-ZEBOV. Cell Rep 20, 2251–2261, https://doi.org/10.1016/j.celrep.2017.08.023 (2017).
Zak, D. E. et al. Merck Ad5/HIV induces broad innate immune activation that predicts CD8(+) T-cell responses but is attenuated by preexisting Ad5 immunity. Proc Natl Acad Sci USA 109, E3503–3512, https://doi.org/10.1073/pnas.1208972109 (2012).
Bucasas, K. L. et al. Early patterns of gene expression correlate with the humoral immune response to influenza vaccination in humans. J Infect Dis 203, 921–929, https://doi.org/10.1093/infdis/jiq156 (2011).
Obermoser, G. et al. Systems scale interactive exploration reveals quantitative and qualitative differences in response to influenza and pneumococcal vaccines. Immunity 38, 831–844, https://doi.org/10.1016/j.immuni.2012.12.008 (2013).
Furman, D. et al. Apoptosis and other immune biomarkers predict influenza vaccine responsiveness. Mol Syst Biol 9, 659, https://doi.org/10.1038/msb.2013.15 (2013).
Henn, A. D. et al. High-resolution temporal response patterns to influenza vaccine reveal a distinct human plasma cell gene signature. Scientific reports 3, 2327, https://doi.org/10.1038/srep02327 (2013).
Tsang, J. S. et al. Global analyses of human immune variation reveal baseline predictors of postvaccination responses. Cell 157, 499–513, https://doi.org/10.1016/j.cell.2014.03.031 (2014).
Vahey, M. T. et al. Expression of genes associated with immunoproteasome processing of major histocompatibility complex peptides is indicative of protection with adjuvanted RTS,S malaria vaccine. J Infect Dis 201, 580–589, https://doi.org/10.1086/650310 (2010).
O’Connor, D. et al. High-dimensional assessment of B-cell responses to quadrivalent meningococcal conjugate and plain polysaccharide vaccine. Genome Med 9, 11, https://doi.org/10.1186/s13073-017-0400-x (2017).
Kennedy, J. S. et al. Safety and immunogenicity of LC16m8, an attenuated smallpox vaccine in vaccinia-naive adults. J Infect Dis 204, 1395–1402, https://doi.org/10.1093/infdis/jir527 (2011).
Matsumiya, M. et al. Roles for Treg expansion and HMGB1 signaling through the TLR1-2-6 axis in determining the magnitude of the antigen-specific immune response to MVA85A. PLoS One 8, e67922, https://doi.org/10.1371/journal.pone.0067922 (2013).
Hou, J. et al. A Systems Vaccinology Approach Reveals Temporal Transcriptomic Changes of Immune Responses to the Yellow Fever 17D Vaccine. J Immunol 199, 1476–1489, https://doi.org/10.4049/jimmunol.1700083 (2017).
Fourati, S. et al. Pre-vaccination inflammation and B-cell signalling predict age-related hyporesponse to hepatitis B vaccination. Nat Commun 7, 10369, https://doi.org/10.1038/ncomms10369 (2016).
Shah, N. et al. A crowdsourcing approach for reusing and meta-analyzing gene expression data. Nat Biotechnol 34, 803–806, https://doi.org/10.1038/nbt.3603 (2016).
RGLab/ImmuneSignatures2: 1.0.6, Zenodo, https://doi.org/10.5281/zenodo.5706261 (2021).
This research was conducted within the Human Immunology Project Consortium (HIPC) and supported by the National Institute of Allergy and Infectious Diseases. This work was supported in part by NIH grants U19AI128949, U19AI118608, U19AI118626, and U19AI089992, U19AI090023, U19AI089992, U19AI128914, U19AI118610, U19AI128913. The HIPC projects are listed at https://www.immuneprofiling.org/hipc/page/showPage?pg=projects. This work was supported in part by the Canadian Institutes of Health Research [funding reference number FDN-154287]
S.H.K. receives consulting fees from Northrop Grumman and Peraton. OL is an inventor on several patents relating to vaccine adjuvants and human in vitro systems predicting vaccine action. R.G. has received consulting income from Illumina, Takeda, and declares ownership in Ozette Technologies and Modulus Therapeutics. The other authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Diray-Arce, J., Miller, H.E.R., Henrich, E. et al. The Immune Signatures data resource, a compendium of systems vaccinology datasets. Sci Data 9, 635 (2022). https://doi.org/10.1038/s41597-022-01714-7