Main

Anyone who has taken an undergraduate virology course is familiar with subject matter focused on the structure of viral genomes and the molecular events associated with multistep viral life cycles. The field of virology has done a remarkable job of characterizing and categorizing viruses and defining the steps of viral attachment, entry, replication and release. Moreover, an understanding of viral protein function has paved the way for the development of antiviral drugs that target viral enzymatic activities. However, many of these drugs function poorly at best, and the virus-centric approach has not proved to be well suited for deciphering the complex and multifaceted virus–host interactions that underlie viral recognition, innate immune signalling and disease outcome. In the past decade, tools have become available to chart a new course, one directed at obtaining comprehensive systems-level views of the host response and the interplay between virus and host.

Systems virology is a phrase coined to describe the application of systems biology approaches to the field of virology1. Systems biology is highly interdisciplinary in character, requiring the combined talents of biologists, mathematicians and computer scientists, and the goal of the field is to gain comprehensive understanding of biological systems. In the case of systems virology, these biological systems can range from virus-infected cells to tissues to whole organisms. Systems-level analyses use high-throughput technologies to measure system-wide changes in biological components such as DNA, RNA, proteins and metabolites, and are dependent on the quality of both the resulting data sets (which are often noisy) and the subsequent data integration and modelling. Ideally, high-throughput data derived from these and other measurements are integrated and analysed using mathematical algorithms to generate predictive models of the system. When a model has been developed, subsequent experimental perturbations of the system (for example, the use of viral mutants or targeted inhibition of host genes or pathways) are used to yield refinements to the model and to increase its predictive capacity2,3,4 (Fig. 1).

Figure 1: The systems virology paradigm.
figure 1

Appropriate experimental models and technologies are used to generate multidimensional data, including virology data. Analyses of such data, in combination with mathematical modelling, are used to generate comprehensive, integrated and predictive models of biological systems and virus–host interactions. Resulting predictions and hypotheses lead to subsequent experimental perturbations, model refinement and a deeper understanding of complex biological processes. These findings and outcomes can be used directly or further refined by the scientific community for various types of disease intervention.

PowerPoint slide

This holistic, host-directed approach stands in contrast to the more traditional reductionist approaches that focus on a pre-determined small set of molecules (genes, proteins or metabolites). Although systems-level (or discovery-based) analyses are often criticized for not being hypothesis-driven, these analyses are increasingly being acknowledged as potent hypothesis generators. Moreover, for dynamic systems such as those involved in the host response to viral infection, systems-level analyses are considered the only way to understand emergent properties — that is, properties or biological outcomes that cannot be predicted by an understanding of the individual parts of a system alone, but rather become apparent only with knowledge of the specific organization of, and interactions between, components5. Because of this, systems virology is an essential and synergistic complement to traditional virology approaches.

This Review focuses on the host response to virus infection and discusses the evolution and significant findings of systems virology, including the identification of gene expression signatures that are predictive of viral pathogenesis and vaccine efficacy, insights into how viruses disrupt cellular metabolism, and the mapping of virus–host interactomes. These accomplishments did not come from a single experiment or study, but rather from a body of work undertaken over several years by different investigators. The field has seen a progression from genomics-based approaches to measurements of proteins and metabolites, and has also embraced the analysis of host genetic variation as a means to better understand disease processes rather than viewing such variation as a source of frustration. Moving forwards, systems virology must also embrace computational approaches that are capable of integrating this information to construct robust virus–host interaction models that incorporate multiple dimensions and scales6,7. We cite examples of studies that are moving in this direction and outline what the next phase of systems virology must encompass to reach its full potential.

Gene expression signatures

With the completion of the human genome project and the advent of microarrays capable of measuring RNA transcripts at a genome-wide scale, the first systems-level analyses became a reality. Over the past 12 years, DNA microarrays have evolved from hundreds of cDNAs spotted on nylon membranes to glass slides containing high-density oligonucleotides that encompass entire genomes (Box 1). Microarrays (then covering only 1,500 human genes) were first used in virology to evaluate the changes in cellular gene expression that occurred in a CD4+ T cell line infected with HIV8. Since then, the use of microarrays to evaluate changes in host gene expression in response to viral infection has become commonplace. In many cases, these studies remain small and narrowly focused, and although they provide glimpses into global responses, low sample numbers make it difficult to determine their reproducibility. In addition, these studies do not provide the robust data sets that are needed for computational modelling, which could result in deep insights into system architecture or behaviour. This shortcoming has perhaps fed scepticism regarding the ability of genome-wide expression profiling to yield transformative discoveries. Below, we provide examples of more comprehensive studies resulting in genomic signatures that have increased our understanding of both viral pathogenesis and the characteristics of the host response required for immune protection.

Signatures of highly pathogenic respiratory viruses. Influenza virus is well known for its ability to rapidly evolve new variants through genetic mutation and genome reassortment, yielding strains that can vary widely in virulence and transmissibility. Most strains cause mild respiratory disease, whereas others, such as the 1918 pandemic virus and highly pathogenic avian H5N1 influenza virus strains (such as A/VN/1203/04), can cause severe and often fatal infections9. The field has used DNA microarrays and functional analyses to define the virus–host interactions that regulate influenza virus pathogenesis10. These studies have identified gene expression signatures that correlate with virulence and have revealed that the timing and magnitude of the host response is a crucial determinant of the eventual outcome of infection.

This phenomenon was first demonstrated in studies that used a combination of mouse and macaque infection models and genome-wide transcriptional profiling to measure the host response to the reconstructed 1918 pandemic virus. In these animal models, the virus causes a rapidly fatal infection marked by severe lung pathology, intense neutrophil infiltration and the rapid and sustained induction of pro-inflammatory cytokine and chemokine genes11,12, an event often referred to as a cytokine storm13. By contrast, macaques infected with a highly pathogenic avian H5N1 influenza virus strain show a rapid and intense induction of interferon (IFN) and innate immune genes, but this eventually resolves as the animals recover14. Genomic analyses have also revealed that macaques infected with the 1918 pandemic strain15 and mice infected with H5N1 influenza virus A/VN/1203/04 show a strong induction of genes encoding inflammasome components (for example, NLRP3 (NOD-, LRR- and pyrin domain-containing 3) and interleukin-1β)16. This H5N1 virus is particularly virulent in mice, and although the inflammasome is part of the innate immune response to influenza A viruses17,18,19, the excessive activation of this response seems to be detrimental in this host.

As high-throughput data have accumulated in public databases, it has become possible to use this information to carry out meta-analyses. This strategy has been used to analyse data from a compendium of published studies that used mouse models to measure host transcriptional responses to lethal or non-lethal strains of influenza virus, respiratory syncytial virus or severe acute respiratory syndrome coronavirus (SARS CoV)20. Two alternative methods were used to generate gene expression signatures that are predictive of high or mild pathogenicity (defined here as 100% mortality and 100% survival, respectively). The first signature consists of 74 genes for which expression changes in the opposite direction in highly versus mildly pathogenic infections, with respect to mock infections (referred to as a 'digital' relationship). The second signature consists of 57 genes that are differentially expressed between highly and mildly pathogenic infections, without reference to mock infections (referred to as an 'analogue' relationship). Most genes in the analogue signature are differentially expressed during both lethal and non-lethal infections (compared with mock infections), but high pathogenicity corresponds with a higher degree of differential expression.

When the two signatures were tested for their ability to predict pathogenicity, the best predictor of a highly pathogenic infection was the analogue signature (Fig. 2). Significantly, this meta-analysis did not take into account the time after infection; that is, data from samples isolated at 1 day and at 5 days post-infection were treated equally. However, the majority of samples that were correctly identified as being from either a mildly or a highly pathogenic infection were from early or late time points, respectively. It is therefore likely that taking time post-infection into account would yield an even more accurate signature.

Figure 2: A 57-gene analogue signature predicts respiratory virus pathogenicity.
figure 2

a | A meta-analysis was carried out on publically available transcriptional profiles of lung samples obtained from mice infected with influenza virus or severe acute respiratory syndrome coronavirus (SARS CoV). Two signatures predictive of the severity of infection (non-lethal versus lethal) were derived, a digital signal and an analogue signal. bd | Each dot represents a gene expression profile; non-lethal infections are indicated by green dots, and lethal infections are indicated by red dots. Regions defined for positive identification of non-lethal (green square) and lethal (red square) infections are indicated. Classification results are shown for the entire gene expression profile (part b), the digital gene signature (part c) and the analogue gene signature (part d). The digital signature comprises a subset of genes (groups 1 and 2; 74 genes) for which expression changes in the opposite direction in lethal versus non-lethal infections, with respect to mock infection, and the analogue signature comprises a subset of genes (groups 3 and 4; 57 genes) that are differentially expressed between lethal and non-lethal infections without reference to mock infections. The analogue signature was more accurate at predicting lethal viral infection (true positives (TP) = 43%) than either the digital signature (TP = 1%) or the entire gene expression profile (TP = 0%).

PowerPoint slide

From these studies, it is now apparent that highly pathogenic respiratory viruses induce or suppress the expression of many of the same genes as mildly pathogenic viruses, but to a greater degree (and with different kinetics). Therefore, knowing the identity of genes that are differentially expressed in response to infection provides only part of the information needed to predict pathogenicity. The magnitude and timing of the host response are crucial determinants of the eventual disease outcome and might have important implications for antiviral therapy. To date, efforts to target the host response with a range of anti-inflammatory drugs have been largely unsuccessful13, and it is likely that effective host-directed therapy will depend not only on the target, but also on exactly when elements of the host response are suppressed or enhanced. Moreover, these findings point to the need for computational approaches that can describe nonlinear relationships (see below) and account for various factors associated with large multivariate data sets21. This also suggests the need for a paradigm shift in biomarker discovery to one that looks at sets of quantitative molecular measurements.

Signatures of vaccine efficacy. The application of systems-level analyses to vaccine research — variously termed systems vaccinology22 or vaccinomics23 — has led to the identification of molecular signatures that are predictive of vaccine immunogenicity and to new insights into the mechanisms of action of vaccines. In one of the first large-scale uses of this strategy, gene expression profiling and computational methods were used to identify gene expression signatures predictive of the strength of human adaptive immune response to the yellow fever vaccine, YF-17D24.

Transcriptional profiling of peripheral blood mononuclear cells from vaccinated subjects revealed that YF-17D induces the expression of genes encoding proteins that are associated with viral recognition and with transcription factors that regulate type I IFNs. Although also characteristic of the transcriptional response to active viral infection, this response did not correlate with subsequent CD8+ T cell or neutralizing-antibody responses, which are thought to mediate protection. However, an alternative computational and classification method, discriminant analysis via mixed integer programming (DAMIP), identified a signature (consisting of complement system and stress response genes) that is highly accurate in predicting subsequent CD8+ T cell activation. This method also identified a separate signature (which includes tumour necrosis factor (TNF) receptor superfamily members) that accurately predicts neutralizing-antibody expression during infection. The robustness of these signatures was verified through the analysis of samples from a different group of subjects vaccinated with a different lot of vaccine, thereby identifying new correlates of vaccine immunogenicity24.

More recently, similar systems approaches have been used to evaluate innate and adaptive immune responses to vaccination against influenza viruses, with the goal of identifying early gene expression signatures that correlate with immunogenicity25. Over a 3-year period, a series of clinical studies was undertaken in which young adults were vaccinated with either inactivated influenza vaccine (TIV) or live-attenuated influenza vaccine (LAIV). Molecular signatures for predicting antibody responses were identified by combining gene expression profiling, antibody response data, real-time PCR analysis and DAMIP. The resulting predictive signature consisted of genes with known roles in antibody responses and other genes with previously unidentified roles in antibody or B cell responses. For example, one gene from the predictive signature, Camk4, encodes calcium–calmodulin-dependent protein kinase type IV (CaMKIV), a protein known to be involved in multiple immune system processes. However, it was not known whether this protein had a role in antibody responses25. To demonstrate the ability of the systems biology approaches used in this study to identify biologically significant targets, Camk4-knockout and wild-type mice were vaccinated with TIV, and the Camk4-knockout mice exhibited significantly higher antibody titres than wild-type mice 7, 14 and 28 days after vaccination, thus revealing that Camk4 is important in regulating B cell responses. Although further investigations are needed to confirm many of these signature predictions, these studies demonstrate that systems approaches can both identify biological targets and generate new testable hypotheses related to the mechanism of vaccine action.

An expanding view of the transcriptome. Until recently, transcriptional profiling depended on the use of microarrays to measure the expression of well-annotated protein-coding genes. The advent of next-generation sequencing (Box 2), however, has brought the ability to rapidly sequence the entire RNA complement of cells or tissues. This has led to a much expanded concept of the host transcriptome, as most recently revealed by the Encyclopedia of DNA Elements (ENCODE) project26. It is now apparent that as much as three-quarters of the human genome is capable of being transcribed and that cells contain vast numbers and varieties of non-protein-coding RNAs27. Some of these non-coding RNAs, such as microRNAs, have been well studied and are known to have roles in viral infection28. For most others, functionality is less clear, but there is growing evidence that long non-coding RNAs also play a part in transcriptional and epigenetic gene regulation and in disease29.

RNA sequencing (RNA-seq) analysis of the host response to SARS CoV infection has revealed the differential expression of a range of host long non-coding RNAs (>200 nucleotides long) in lung samples from virus-infected mice30. Many of these RNAs have similar expression patterns in vitro during influenza virus infection and during type I IFN treatment, suggesting that they are involved in regulating the innate immune response to a range of viruses30. Expanding these analyses to include the sequencing of small RNAs also revealed the differential expression of more than 200 small RNAs, such as small nucleolar RNAs (snoRNAs) and PIWI-interacting RNAs (piRNAs), in response to SARS CoV and influenza virus infections31. Similarly, RNA-seq has revealed that an HIV-infected CD4+ T cell line exhibits differential expression of host microRNAs, snoRNAs and pseudogenes compared with uninfected cells32. Viral mRNA constitutes a surprisingly large portion of the total RNA in HIV-infected CD4+ T cells (in this study, nearly 40% by 24 hours after infection), and reads mapping to the viral genome have revealed novel viral RNA splice variants. A correlative analysis that combined mRNA-seq and small RNA-seq data suggested additional roles for host microRNAs in T cell activation and in transcriptional and cell cycle regulation during HIV infection33.

Together, these studies attest to the power of RNA-seq to provide entirely new views of the transcriptional landscape and to highlight the previously unanticipated changes in transcription that occur in response to viral infection. Such insights are not limited to host transcription, as a combination of RNA-seq and mass spectrometry recently revealed that human cytomegalovirus (HCMV; a 240 kb DNA virus) produces hundreds of previously unidentified transcripts and short proteins that might have functional, regulatory or antigenic properties34. Although the functional significance of changes in non-coding-RNA expression are only beginning to be examined29, a better grasp of non-coding-RNA expression and function will certainly be necessary for a complete understanding of viral pathogenesis and the innate and adaptive immune responses, and for a more general view of gene regulation in the context of viral infection. Unfortunately, despite the advances that can be made using RNA-seq, the extensive computing infrastructure needed to handle large data files and the computational prowess required to align, assemble and analyse short sequence reads continues to put the approach out of reach for most laboratories.

Beyond the transcriptome

Of course, gene expression profiling provides only one measure of the host response to infection. In recent years, advances in systems-wide technologies have facilitated a 'multi-omics' approach that includes proteomics, metabolomics, lipidomics and elucidating virus–host protein interactomes. All these measurements are adding to our understanding of the host response to viral infection, and new abilities to evaluate the role of host genetics and epigenetics are adding additional layers of complexity.

Alterations in cellular metabolism. Viruses have long been known to cause changes in host metabolism; however, the full extent of such changes was not clear until systems approaches were used to evaluate the metabolomic reprogramming that results from HCMV infection35. Using liquid chromatography–mass spectrometry to directly measure the levels of more than 160 different metabolites, it was discovered that HCMV infection induces large increases in numerous metabolites, including glycolytic and tricarboxylic acid (TCA) cycle intermediates, amino acids, NADH and pyrimidine. The metabolic signature induced by HCMV infection is readily distinguishable from the signature associated with the transition of quiescent cells into the G1 phase of the cell cycle, revealing the replacement of cellular metabolic homeostasis with an HCMV-specific metabolic programme. Systems-level metabolic flux profiling produced a first-of-a-kind metabolic map showing linkages between compounds and quantitative information about metabolic activity. The map is biochemically revealing, indicating that there is a global upregulation of metabolism by HCMV, with the greatest increase in the TCA cycle and its efflux to feed fatty acid biosynthesis36.

To determine whether this reprograming of the host metabolome is cell type or virus specific, fibroblast and epithelial cells were used to compare the host response to two strains of HCMV and two strains of herpes simplex virus type 1 (HSV-1)37. All four viruses produced significant changes in approximately 50% of the metabolome. Interestingly, the changes are consistent across different strains of the same virus and across cell types, but differ markedly between HCMV and HSV-1, demonstrating that these viruses induce distinct metabolic programmes. The findings derived from these systems-level approaches generated the hypotheses needed to drive additional focused studies in which more traditional methods were used to investigate the molecular mechanisms underlying the virus-specific hijacking of the host metabolome and the relevance of these mechanisms to potential therapeutic interventions38,39,40,41,42.

Genomic and lipidomic analyses have also revealed that infection of primary bone marrow-derived macrophages with mouse cytomegalovirus results in a downregulation of metabolites involved in the cholesterol metabolism pathway43. The lowering of cholesterol levels is mediated through the IFN-dependent downregulation of sterol regulatory element-binding protein 2 (SREBP2), a transcription factor that regulates sterol biosynthesis. Pharmacological or RNAi-mediated inhibition of the sterol pathway was shown to result in increased protection against viral infection in cell culture and in mice, demonstrating the potential benefit of targeting a host metabolic pathway as an antiviral strategy.

Virus–host interactomes. Based on the premise that viral proteins interact with cellular factors to promote efficient viral replication and pathogenesis, system-wide small interfering RNA (siRNA) or short hairpin RNA (shRNA) screens, yeast two-hybrid libraries and bioinformatic methods are being used to construct and describe virus–host interactomes and in turn identify cellular targets for therapeutic intervention. Such interactomes have been generated for numerous viruses, including influenza virus, HIV, dengue virus, hepatitis C virus (HCV), herpesviruses and SARS CoV, and have yielded 'hit lists' of cellular factors that might be important in viral pathogenesis44,45,46,47,48,49,50,51. Although the number of interactions identified by these studies is impressive, few of the genes identified have been subjected to functional analyses to confirm their role in viral replication. Moreover, there is little overlap in the host factors identified by the different screens. This could be due to a variety of factors, including variation in the screening systems, cell types and viruses used, as well as differences in the methods applied to identify interacting partners. The benefits and shortcomings of these studies, along with factors affecting their outcome, have been the subject of several detailed reviews52,53,54,55.

To fully realize the collective information residing in this vast collection of data, it will be necessary to develop computational and mathematical methods which are capable of fully integrating interactomes (as well as their associated metadata) that have been constructed using different methods or obtained from different biological systems. In a first attempt at such integration, meta-analysis of virus–host interactome data for five distinct viruses identified both common and virus-specific human protein targets56. Common targets include proteins involved in the cell cycle, apoptosis, the unfolded-protein response and nuclear transport. Many of the common host targets identified are multifunctional hubs — that is, they have multiple functions or roles within the cell. When coupled with the fact that many viral proteins interact with multiple host proteins, this combination of factors might explain how viruses, with their relatively small genomes, are capable of dysregulating so many aspects of host biology. Unfortunately, because these common host targets often have numerous cellular functions, focusing on them as drug targets might be problematic52.

In an interesting twist on these studies, virus–host interactome data, together with data on host transcriptional changes resulting from the expression of 123 viral ORFs derived from DNA tumour viruses, were used to predict cellular genomic variations (for example, mutations, deletions or translocations) that can give rise to cancer57. By defining the rewiring of cellular networks and pathways caused by the viral proteins and identifying a list of host proteins that are central to the rewiring, the systematic identification of host targets of DNA tumour viruses was found to be as successful as traditional large-scale cataloguing of tumour mutations for cancer gene identification. This suggests that disease phenotypes, whether resulting from viral infection or cancer, might be the result of network perturbations rather than individual genetic or genomic variations.

Host genetics and epigenetics. Host genetic variation has typically been thought of as a confounding factor that limits the ability to draw conclusions from data obtained using outbred (for example, human and nonhuman primate) populations. More recently, attempts are being made to better understand how genetic diversity influences infection outcome and how knowledge of genetic diversity can be incorporated into the construction of robust and predictive network models. One particularly exciting example is the Collaborative Cross mouse resource. The Collaborative Cross is a unique panel of multiparental recombinant inbred mouse strains designed to capture the level of genetic diversity found in outbred populations; this provides a resource for systematically identifying individual and multiple host genetic traits that contribute to complex immune phenotypes and disease outcome (Box 3).

As an example, a genetically diverse panel of pre-Collaborative Cross mice (not fully inbred) determined to have severe or mild responses to influenza virus infection was used to identify expression quantitative trait loci (eQTLs) associated with the host response to infection. Twenty-one high-confidence eQTLs were identified, 17 of which were confirmed using mice from the eight Collaborative Cross founder strains58. Many of these genes have known functions related to immunity or the host response to infection, such as Ifi27l2a (IFNα-inducible 27-like 2A), Clec16a, Pde7a (high-affinity cyclic AMP-specific 3′,5′-cyclic phosphodiesterase 7A) and Tcf7l1 (transcription factor 7-like 1), whereas for others, such as the serine/threonine protein kinase gene Sik1 and sentrin-specific protease 5 (Senp5), their role in influenza virus infection is not yet clear. Structural equation modelling was used to identify potential regulatory relationships between additional genes and the validated eQTLs, suggesting that these genes and corresponding subnetworks have important roles in the host response by either promoting a protective (mild) or a pathologic (severe) response to influenza virus infection, depending on the specific genetics of the host. Thus, using this genetically diverse population, high-confidence gene candidates involved in regulating the host response to influenza virus infection were identified, allowing for future investigations into their utility as therapeutic targets.

It is also becoming apparent that epigenetic mechanisms have a role in regulating the outcome of viral infection, and methods for genome-scale mapping of DNA methylation59 and histone modification60 are now available. Such epigenetic modifications contribute to chromatin structure and organization, which in turn influence transcriptional activity, the immune response61 and viral latency62. Viruses can also use epigenetic control mechanisms to their advantage; the NS1 protein of H3N2 influenza virus, for example, acts as a histone mimic to suppress the expression of antiviral genes63. Characterizing and understanding epigenetic mechanisms might therefore be an essential requirement for the construction of gene regulatory networks64,65.

Putting the pieces together

As more and more high-throughput data become available, systems virology is poised to enter a new phase to fulfil its initial promise of revolutionizing our understanding of virus–host interactions. To do this, the field must move beyond just the listing of molecules that are differentially expressed on viral infection. Instead, the relationships between key molecules must be defined. Such relationships may be cause-and-effect relationships (for example, transcription factors and their target genes), the result of co-expression, or due to genetic or direct physical interactions. Here, we give examples of several methods that are being used to further our understanding of virus–host interaction networks, and we then discuss key computational challenges that must be addressed.

Network modelling and analysis explores the relationships among molecules and the structure and organization of these relationships to predict the behaviour of the network or system. For example, the context likelihood of relatedness (CLR) method is used to predict genes that are highly interconnected (referred to as hubs) or that exhibit a high degree of betweenness centrality (referred to as bottlenecks). Genes with high betweenness centrality exhibit fewer connections than hub genes, but because they are located between (and connect or bridge) multiple subnetworks, they can have a powerful role in controlling network signalling (Fig. 3). Bottleneck genes often function as key genes in the regulation of disease progression and are therefore attractive targets for further experimentation66,67. An alternative method, co-regulation network analysis (PCluster)68, has been combined with genome-wide expression profiling and yeast two-hybrid analysis to identify relationships between gene expression and direct physical interactions, revealing previously unrecognized roles for several cellular and viral proteins in the host response to H1N1 influenza virus69. These proteins include a network of RNA-binding proteins, components of the WNT signalling pathway and viral polymerase subunits. However, as these types of analyses often only infer correlations between network components, additional studies are required to verify model predictions.

Figure 3: Co-regulation networks.
figure 3

a | A schematic showing one bottleneck gene and two hub genes in a hypothetical network. Hubs are highly connected to other nodes in the network. Bottlenecks have a high degree of betweenness centrality and connect or act as bridges between subnetworks. b | The context likelihood of relatedness method was used to infer functional associations between differentially expressed genes responding to H5N1 influenza virus infection in mice. Genes (nodes) are coloured according to the ratio of gene expression values for high-dose infection to low-dose infection. Red indicates higher expression and blue indicates lower expression in the high dose than in the low dose. The complexity of these networks does not allow a visual determination of bottlenecks. Rather, the betweenness centrality is calculated for each node, and the size of the node is relative to this value. The larger nodes are major bridges between different parts of the network. c | A major hub from a separate subnetwork within the H5N1 influenza virus co-regulatory network.

PowerPoint slide

These network analysis methods have also been used to analyse the network topology of networks derived from proteomic and lipidomic profiling data. For example, such an analysis identified two mitochondrial fatty acid oxidation enzymes, DCI (also known as ECI1) and HADHB, as bottleneck proteins and possible targets through which HCV disrupts cellular metabolic homeostasis70. The importance of DCI (and of cellular metabolic homeostasis in general) during HCV infection was then confirmed by additional means, which included pharmacological inhibition of fatty acid oxidation and targeted siRNA knockdown techniques, both of which demonstrated that DCI is required for productive HCV infection in hepatoma cell lines71,72. Similarly, an analysis of the interaction networks between host proteins and HCV core and NS4B proteins has been used to identify host proteins that constitute potential anti-HCV therapeutic targets, including α-enolase, paxillin and a solute carrier protein (SLC25A5)73. A better understanding of network topology will not only provide the opportunity to identify potential targets for therapeutic intervention, but also offer insights into possible off-target effects on network signalling that might be induced by drug treatment.

Although much can be learned from the construction and topological analysis of host–pathogen interaction networks using samples from whole tissues, the heterogeneity of cell types present in most tissues means that such networks provide a generalized picture of the changes that occur in the host during the course of infection. For example, it is difficult to delineate from these types of networks the signalling events that might occur between infected lung epithelial cells and cells of the immune system, both inside and outside the infected tissue. These intercellular interactions are also controlled by signal transduction pathways, which communicate signals from the extracellular environment to intracellular effector processes. To fully interpret the data, we need a much better understanding of intercellular signalling processes, the cells that are involved and the directionality of their effects on infection outcome.

A few studies have begun to explore cell type-specific and intercellular signalling on a system-wide scale. For example, flow cytometry and gene expression data from bronchoalveolar lavage (BAL) fluid from young adult and aged macaques infected with 2009 pandemic H1N1 influenza virus were analysed in conjunction with data from the Immune Response In Silico (IRIS) database. This database contains cell type-specific gene expression patterns associated with various types of immune cells. By computationally comparing differentially expressed genes in BAL with cell type-specific gene expression patterns in the IRIS database, it was possible to identify genes associated with specific immune cell types, including activated dendritic cells, CD4+ and CD8+ T cells and naive B cells. In particular, genes associated with B cell and T cell markers were more highly upregulated in young adult animals74.

Recent studies of mouse models of intestinal inflammation induced by TNF treatment provide another good example of how systems approaches can be used to evaluate signalling between cell types in a complex tissue environment75,76. In these studies, the authors combined flow cytometry measurements and phosphoprotein, cytokine and chemokine expression data from various immune cell types over time and under diverse conditions and were thus able to construct statistically robust multivariate regression models that related the phosphoprotein signals, cytokines, chemokines and cell types to specific phenotypes. These models helped to elucidate key molecular and cellular processes governing epithelial cell apoptosis and proliferation in response to TNF treatment. For example, monocyte chemotactic protein 1 (MCP1; also known as CCL2) was predicted by the model to be especially protective against TNF-induced apoptosis, and this hypothesis was confirmed by treating with an MCP1-specific antibody before TNF administration. The model also indicated that plasmacytoid dendritic cells might be particularly important for inducing apoptosis, and depleting these cells from mice under conditions that normally produced the most severe epithelial cell apoptosis did indeed revert the TNF-induced phenotype to the mildest outcome.

In addition to constructing network models that span intra- and intercellular signalling processes, it will be necessary to consider nonlinear relationships such as how the network functions over time (that is, the dynamics of the system). This is particularly true in light of evidence (discussed above) that the magnitude and timing of the host response to respiratory viruses are crucial determinants of the eventual disease outcome. Similar evidence is accumulating that the outcome of HIV infection is also related to the activation dynamics of host gene regulation77,78. However, high-throughput data are typically static and often not adequate for modelling dynamic systems. To help overcome this limitation in analysing network dynamics, inference methodologies are being devised to reinterpret activity differences caused by system perturbations as differences in observation time. For example, changes in pathogen-induced gene expression that are associated with genetic variability, whether in the pathogen (for example, mutant viruses) or the host (as occurs in the Collaborative Cross mouse model), can potentially be used to indirectly infer the crucial dynamics of a system without having to measure the system over time. This innovative approach has only recently become feasible with major breakthroughs in the theory of dynamic systems and in geometrical high-dimensional analysis methodology79,80.

As virology continues to transition into a more quantitative science, increasing attention must be paid not only to network dynamics but also to other nonlinear interactions, such as cooperative or synergistic relationships, which characterize so much of biology. Current biomarker discovery and the identification of molecular predictors of adjuvant, vaccine or, more generally, drug efficacy are largely unsuccessful because nonlinear interactions between molecules, as well as genetic diversity in populations, are not taken into sufficient consideration. Geometrical methods (that is, methods used to identify the structure in data by identifying spatial and temporal relationships) are increasingly being used in the analysis of high-throughput molecular data. In particular, novel combinations of geometrical methods, such as those based on singular value decomposition (SVD) and multidimensional scaling (MDS)81,82, are beginning to be used in systems virology to better understand the nonlinear interactions between variables and isolate those interactions from biological noise.

Liver biopsies were carried out on patients who were infected with HCV and had received liver transplants, and SVD–MDS analysis of transcriptomic data from these biopsy samples, in combination with categorical analysis (to take into account variables such as age, time post-transplantation and fibrosis score), has been used to identify a molecular signature for patients at risk of developing severe fibrosis83. SVD–MDS and co-abundance networks (which relate molecules to each other on the basis of their abundance profiles) were also used to integrate proteomic and metabolomic data sets obtained from the same cohort of patients. This strategy identified a potential role for oxidative stress in rapid fibrosis progression after the transplant surgery and identified serum metabolites that might prove useful as biomarkers for predicting progression to fibrosis84. This understanding of network structure can now be used to simulate human liver metabolism using novel flux balance modelling approaches to better comprehend and eventually treat disease85. Additional geometrical approaches have been described and should prove useful for effectively bridging different technologies86,87 and for integrating diverse types of data, thereby enabling better analysis of the data that are already available in public databases and repositories. Finally, geometrical methods, as well as links between geometry, information theory and probability theory7, will also help to identify causal relationships88, which is clearly a currently unmet challenge. Unlike purely statistical approaches, geometrical methods can be used to integrate different individual measures for the purposes of comparison and combination into coherent objects that identify relationships between genes, transcripts or proteins.

Conclusions

After having been extensively hyped as a paradigm shift, systems-level approaches have since been criticized for failing to rapidly fulfil their initial grand promises. Standing in the way have been numerous technical, experimental and mathematical hurdles. However, as discussed in this Review, significant progress is being made, and new computational approaches are leading the way. Also of importance is the ever-growing availability of high-throughput data in public databases; indeed, data sharing is crucial to the future success of systems virology (Box 4). To date, bench work has been the necessary precedent to computational approaches, but we are now at a point when there are sufficient data available that computational methods can be the starting point for making discoveries, generating hypotheses and, in turn, guiding targeted bench work. A prime example of this is the flourishing of virtual screening methods in drug discovery; these methods depend on systematic drug characterization efforts and on public databases holding functional genomics information obtained under standard experimental conditions89,90. Moreover, the information gained from systems approaches forms the basis for what has been termed P4 (personalized, predictive, preventive and participatory) medicine91. In the case of infectious disease, genetic information on the individual and the pathogen, disease-predictive molecular signatures, targeted risk reduction and prophylactic measures, and active patient participation will merge into a new approach to medical care92.

In conclusion, contemporary virology cannot afford to simply catalogue myriad circumstantial observations. Systems-level approaches provide the opportunity to assemble the incomplete puzzle of biology in a meaningful way that will advance our understanding of how viruses cause disease and lead to improved patient care. Although there will always be a need for traditional microbiology, the success of undergraduates in this day and age will depend on their ability to combine traditional skills with systems approaches and mathematics. The time is ripe, the data are here, and the mathematics to put them together is coming along. Take heed of the words on Plato's doorstep: “αγεωμετρητος μηδεις εισιτω” (“Let no one ignorant of geometry enter”).