Vaccines have improved human health enormously. The trouble is, we are not quite sure how they work. While we know a great deal about some components of the vaccine response, the answers to many important questions remain murky: for example, which features of the vaccine response are required for immunological protection, or whether different vaccines induce similar patterns of immunity. Without answers to these questions, clinical vaccine discovery today looks more like the trial-and-error efforts of Pasteur's era than the rational design approach of modern drug development. As a result, progress in coming up with effective vaccines against diseases such as HIV and tuberculosis remains slow. To accelerate vaccine development, several groups have identified gene-expression signatures present in human peripheral blood mononuclear cells (PBMCs) that predict immune responses to yellow fever1,2 and influenza vaccines3. However, the extent of vaccine-induced change in PBMC profiles can be small, and the number of genes measured in a typical gene-expression profiling experiment is large4, making it difficult to distinguish the signal from the noise. In this issue, Li et al.5 provide a new computational resource to make identifying subtle signatures easier and use it to compare the signatures elicited by five different vaccines.

The simplest way to analyze gene-expression profiles is to identify individual genes that are differentially expressed between phenotypes or conditions of interest. This can be achieved by comparing profiles from, say, samples obtained before and after vaccination, or samples from subjects with varying degrees of vaccine-induced antibody responses, and identifying genes whose difference in expression is greater than would be expected by chance. This gene-by-gene approach has been very useful in identifying specific genes that regulate the immune response6, but it tends to 'ignore' more complex and subtle patterns evident in genome-scale expression data7.

Li et al.5 have developed a new resource to help identify biologically meaningful patterns of gene expression in PBMC profiles from vaccinated subjects, and they use it to ask whether different vaccines elicit unique or shared patterns of immunity. Their study provides both a technical resource for transcriptional analysis and the first comparison of gene-expression signatures elicited by different vaccines.

The technical resource they have developed is a compendium of coordinately expressed modules of genes. They identified the modules by analyzing patterns of gene expression present in a very large data set created by merging 30,000 PBMC gene-expression profiles from 500 published studies. They used this reference collection of PBMC expression profiles to identify groups of genes whose expression correlates with each other. The graphical representation of this long list of gene-gene correlations (termed interactions) can be visualized as the familiar 'hairball' beloved of systems biologists but few others. In interaction networks, each gene is represented by a node linked by connections (edges) to other genes with correlated expression levels (Fig. 1). Because the expression levels of many genes are often closely correlated, interaction networks form a dense thicket of connections. Modules can be identified within such a network as groups of genes that have more connections between them than to other genes. Using such an approach with the interaction network defined from 30,000 PBMC samples, Li et al.5 refined and extracted a set of 334 “blood transcriptional modules.”

Figure 1: Identifying gene modules from gene-expression data.
figure 1

Biological functions of the cell are carried out by groups of genes—modules—that often share correlated levels of expression. In a typical experiment, expression levels of thousands of genes (G1, G2,...) are measured in many samples (S1, S2,...). Correlation in expression level of each gene and all others is tested with linear or nonlinear tests of association. Networks are modeled by connecting genes (represented as nodes) with lines (represented as edges) to correlated genes. Genes whose expression levels do not correlate are not connected.The interaction network is then compiled from all nodes and edges. Modules are identified as densely connected areas of the network.

Why did they go to the trouble of coming up with this set of modules? Because without them, changes elicited by some of the vaccines they were studying were too subtle to be easily detected. They analyzed PBMC expression profiles from 30 healthy donors vaccinated with one of two vaccines against the bacterial pathogen Neisseria meningitidis. The first vaccine (MPSV4) is a quadrivalent vaccine containing polysaccharides from four serogroups of the organism; the second (MCV4), a polysaccharide-protein conjugate vaccine, comprises conjugates of the same four polysaccharides together with diphtheria toxoid (DT) protein adjuvant. Initial analysis of the transcriptional response to each of these vaccines using a conventional gene-by-gene analysis yielded a disappointingly small number of differentially expressed genes. The authors found many fewer changes, for instance, than are elicited by other vaccines such as YF-17D (an attenuated live viral vaccine against yellow fever virus)1,2 or the trivalent inactivated influenza vaccine (TIV)3,4. Undaunted, Li and colleagues then applied their collection of blood transcriptional modules to identify subtle differences between vaccines based on changes in the aggregate expression of gene modules.

This time, they found much more striking differences. They saw three broad patterns of expression: a protein recall response that correlated with the antibody response to TIV and with the antibody response to the DT portion of the MCV4; a primary viral response elicited by YF-17D; and an anti-polysaccharide signature shared by the response to the polysaccharide portions of MCV4 and MPSV4. In addition, the nature of the modules shared by different vaccines also raised hypotheses about how different vaccines function. For instance, the modules that correlated with the anti-polysaccharide response suggested the involvement of myeloid dendritic cells (DCs). Subsequent experiments confirmed that human myeloid DCs and mouse CD11C+ DCs were efficiently activated by incubation with MPSV4. This study represents the first step toward identifying the molecular signatures that correlate with antibody responses induced by different classes of vaccines.

There are a number of reasons why analyzing complex data sets based on differences in modules of co-regulated genes—rather than individual genes—makes sense. First, a modular approach to studying gene expression is not just analytic grandstanding; rather, it reflects a general design principle found in nature8. As much as hairball figures might exasperate non-systems biologists, they represent a fair approximation of how the cell's business is transacted. Cellular functions do not result from the isolated activities of individual genes. Instead, they arise from the cooperative effects of groups of functionally and sometimes physically interacting gene products. Fortunately, the transcript abundance of genes that function in the same biological process is often co-regulated9. This means that functionally related groups of genes can be detected using statistical tests of association such as the Pearson correlation coefficient (for linear relationships) or by using mutual information (for nonlinear relationships, as used by Li et al.5). Analytic approaches that identify differences in the expression of groups of functionally related genes are therefore likely to stay closer to the underlying biology than those that evaluate genes one by one4.

A second advantage is that analysis using modules of genes can detect subtle changes in gene expression in many functionally related genes even when large shifts in the expression of smaller numbers of genes are absent. This is because differences between cells states are often manifest by small changes distributed across networks of genes10. This is particularly important in the analysis of data from genetically heterogeneous human subjects where modest biological signals generated by vaccination can be swamped by noise, as Li et al.5 demonstrate.

Third, analysis at the module level is likely to be a more tolerant of measurement inaccuracies. For instance, measurement of differences in the expression of a few genes of interest in vaccinated subjects may be affected by technical variability from one study to the next.However, detection of the coordinated up- or downregulation of hundreds of genes is much less likely to be derailed by inaccurate measurement of a small number of them. Indeed the modular arrangement of genes involved in important cellular functions may accomplish a similar error-reduction role in nature, insulating the cell from the consequences of aberrant expression of a rogue gene or two11.

Much as Li et al.5 have done, another group has also developed a complementary resource of modules and used it to analyze vaccine responses12. Together, these companion studies provide immunologists with a powerful set of resources for analyzing human PBMC gene-expression profiles. Helpfully, Li et al. have developed an intuitive web interface (http://www.immuneprofiling.org/papers/meni/) and application programming interface (API) to enable their modular resource to be easily used and integrated into other analysis software.

Still, a number of caveats remain to this modular approach to analysis. While the existence of modules of genes may be statistically evident, their function is not always so clear. Assigning a 'name' to a gene module is therefore dependent on expert knowledge that tends to introduce subjectivity into the process. A useful alternative is to identify sets of genes from those differentially expressed in specific, well-annotated experiments—an approach that does not require manual annotation10. Other strategies to identify sets of genes from well-annotated experiments in a way that does not require manual annotation have proven a useful alternative10. In addition, a general concern when studying expression profiles from complex mixtures of cells such as those present in PBMC is that it is difficult to distinguish between changes in the biology of one cell state and changes in the relative frequencies of cells in a mixed population. However, approaches to deconvolve mixed cell populations analytically13 or to measure expression profiles in individual cells will help address this concern.

Finally, if we are to truly understand the mechanistic basis of the vaccine response, simply cataloging differences in transcriptional modules elicited by vaccination is unlikely to be sufficient by itself. Rather, we will need to combine observation of changes in modular gene expression in humans with directed experiments in vitro or in animal models to understand the biology of the underlying the transcriptional circuits. For instance, perturbing putative regulators of expression modules can identify genes that control the transcriptional response to TLR signaling14. The study by Li et al.5 is therefore important not only because it is the first comparative molecular analysis of five different vaccines but also because it starts to identify the transcriptional modules whose biology we must define in order to understand how vaccines work.