We performed a survey to identify the cellular proteins and associated complexes interacting with 70 viORFs inducibly expressed from an identical genomic locus in a human cell line (HEK293 Flp-In TREx) competent for innate antiviral programs12,13(Fig. 1a). This set-up allowed us to gauge the expression levels of the viral proteins and to assess the formation of endogenous protein complexes under physiological conditions in human cells14. We selected the viORFs to cover four groups of viruses representative of ten different families and checked for their correct expression (Supplementary Figs 1, 2a–c and 3 and Supplementary Table 1)15 and, in selected cases, immune modulatory activity (Supplementary Fig. 2d, e)16,17. We isolated interacting cellular proteins by tandem affinity purification (TAP) and analysed purified proteins by one-dimensional gel-free liquid chromatography tandem mass spectrometry (LC–MS/MS) (Supplementary Fig. 4a, b)18. The 70 viORFs specifically interacted with 579 cellular proteins with high confidence, resulting in 1,681 interactions (Fig. 1a, Supplementary Fig. 4c and Supplementary Table 1; see Methods for details). To validate our approach we assessed the impact of viral infection on the identified viORF–host-protein interactions with the use of several cognate viruses and found decreased numbers of co-purifying proteins, probably as a result of decreased cellular viability as well as competition with the tagged viORF (Supplementary Fig. 5). In addition, treatment with type I interferon (IFN) (Supplementary Fig. 4d) to simulate a host immune response had little effect on the interaction pattern of selected viORFs (Supplementary Fig. 5).

Figure 1: Host factor survey set-up and general properties of the data set.
figure 1

a, Workflow of the host factor survey. b, Topological network properties of proteins identified as targets of viral proteins. The histograms compare the average property of proteins in the humPPI with the entire group of viORF interactors, or with viORFs derived from viruses with DNA and RNA genomes, respectively.

PowerPoint slide

Of the 579 cellular proteins identified as interacting with the 70 viORFs, there was a strong enrichment for proteins associated with innate immunity, further validating the approach and potentially revealing additional unknown components of the host antiviral defence network (overlap with InnateDB database19; P < 2.3 × 10−47) (Supplementary Fig. 6a and Supplementary Table 2). There was also a strong enrichment for ubiquitously expressed proteins20 (P < 2.2 × 10−138) and for evolutionarily conserved proteins (P < 2.2 × 10−16) consistent with the coevolution of virus–host relationships (Supplementary Fig. 6b–d and Supplementary Table 3).

To obtain a more comprehensive view of how viORFs influence host cell processes, we used quantitative information from the mass spectrometry data to compute the strength of impact of each viORF on its cellular targets, and used these quantitative parameters in all subsequent analyses. We also incorporated data from the human protein–protein interactome (humPPI) assembled from public databases, to analyse the protein network associated with the viORF-interacting cellular targets. We found that in comparison with an average human protein, the average viral target was distinct in four ways: it was significantly more connected to other proteins; it was in a more central network position; it participated in more cellular pathways; and it was more likely to be engaged in central positions within these pathways (Fig. 1b and Supplementary Fig. 6d, e). These properties are consistent with a strong influence on pathways and effective control of biological networks21, which is in line with the parsimonious use of viral genetic material, and coevolution of the virus with the host organism.

Our large host-factor survey using a defined cellular set-up offers the unique opportunity to identify host-cell perturbation strategies pursued by individual viruses, families and groups. On the basis of the humPPI, 70% of the viORF-interacting cellular factors formed a coherent protein–protein interaction network (Supplementary Fig. 7a). When mapped on the entire humPPI, viral targets seemed to occupy central positions (Supplementary Fig. 7b). We also grouped the cellular targets on the basis of their interaction with viORFs from single-stranded (ss) or double-stranded (ds) RNA or DNA viruses and found that about half of the viORF targets linked to a single viral group, and the rest interacted with viruses of more than one group (Fig. 2a). Statistically significant enrichment for individual gene ontology (GO) terms, representing categories of biological processes, could be identified for each subnetwork. Proteins targeted by ssRNA(−) viORFs were enriched for processes related to protection of the viral genome and transcripts from degradation or detection by the host, and for those promoting efficient viral RNA processing (Fig. 2a). This is illustrated by the interaction between NS1 of influenza A virus (FluAV) with the 5′→3′ exoribonuclease XRN2, and among the NSs protein of Rift Valley fever virus, the mRNA export factor RAE1 and the nuclear pore complex protein NUP98. In contrast, dsRNA virus targets were enriched for protein catabolic processes (Fig. 2a) with both rotaviruses and reoviruses (NSP1 and σ3) engaging SKP1–CUL1–F-box protein complexes (containing FBXW11, Cullin-3, and Cullin-7 and Cullin-9, respectively), which mediate protein degradation.

Figure 2: Network of identified targets and network perturbation induced by viORFs.
figure 2

a, Network representation of all the viORF–target-protein interactions with viral targets grouped according to the genome type of the interacting viORF(s). Proteins identified in the negative control cell line were subtracted as non-specific binders. Triangles represent viORFs; circles represent viral target proteins. Protein interactions functionally validated in detail in the study are marked in dark red. Up to three GO terms significantly enriched in the corresponding viral target subsets are shown around the network to highlight specific functions. b, viORFs targeting one or two proteins that physically interact and are involved in one or more biological processes have the potential to perturb communication or synchronization within or between the given process(es). Significant perturbations were determined (P < 0.001) using targets of viORFs derived from DNA or RNA viruses; edge thickness represents a normalized perturbation score.

PowerPoint slide

To determine which cellular signalling pathways are targeted by viORFs and to look for differences between DNA and RNA viruses, we used the Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations (Supplementary Table 4). Clear distinctions in preferences were observed between the different viral groups, with viORFs of RNA viruses targeting the JAK–STAT and chemokine signalling pathways, as well as pathways associated with intracellular parasitism, and viORFs of DNA viruses targeting cancer pathways (glioma, acute myeloid leukaemia and prostate cancer) (Supplementary Table 4). Among the viral targets that are involved in multiple cellular pathways were two catalytic and three regulatory subunits of the phosphatidylinositol-3-OH kinase family, identified with the FluAV NS1 protein and with the TLR inhibitory protein A52 of vaccinia virus (VACV) (Supplementary Fig. 8a)4. We functionally validated these interactions and identified a critical role for one of the catalytic subunits (PIK3CA) in TRIF-mediated IFN-β promoter activation (Supplementary Fig. 8b–d).

The higher probability of viORFs targeting cellular proteins that link different pathways (Fig. 1b and Supplementary Fig. 6d) prompted us to map which of these pathway connections were preferentially targeted and thus were probably disrupted (Fig. 2b), and to compare the disruption patterns brought about by viORFs from DNA viruses with those from RNA viruses. About one-third of the connections between specific cellular processes were hit by both viral types, suggesting a similar mechanism of perturbing the host cells. viORFs from DNA viruses preferentially targeted proteins linking the cell cycle with either transcription or chromosome biology, possibly reflecting the necessity of uncoupling viral replication from cellular growth. In contrast, RNA viruses targeted proteins involved in RNA metabolism and also protein and RNA transport, while preferentially disrupting the link between signalling and immunity-related processes (Fig. 2b).

To integrate our viORF–host-protein interaction data sets with intracellular events occurring after viral infection we compared our viORF interaction proteomic profile with the transcriptional profile obtained after infection of the cells with hepatitis C virus (HCV) (Supplementary Table 5). The protein-processing pathway in the endoplasmic reticulum (ER) (Supplementary Fig. 9a) was the most affected process. The HCV viORFs specifically targeted six ER-associated proteins. To analyse the broader implications of this targeting on the cell, we identified the cellular proteins known to bind to these six ER targets and analysed their functions bioinformatically (Supplementary Fig. 9b). Of the 80 cellular protein interactors, 42 were enriched in either cell-cycle or apoptosis functions (Supplementary Fig. 9c). Ubiquitin-specific peptidase 19 (USP19), a deubiquitinating enzyme involved in the unfolded protein response22, interacted with the viORF NS5A. To study the biological relevance of this interaction, we analysed the localization of USP19 after HCV infection and found that it relocalized to HCV replication compartments in replicon-containing cells, probably disrupting its cellular function (Supplementary Fig. 10a, b). Indeed, NS5A inhibited the ability of USP19 to rescue destabilized green fluorescent protein (GFP) that was degraded by the proteasome (Fig. 3a). In addition, infection of cells with wild-type HCV decreased cell growth23, whereas infection with recombinant virus lacking the NS5A–USP19 interaction site, which mapped to 50 amino acids in domain III (Supplementary Fig. 10c–g), did not (Fig. 3b and Supplementary Fig. 10h). Thus, the cell-proliferation-inhibitory properties of NS5A are probably mediated by its inhibition of USP19, which is known to promote cell growth24, and implicates the targeting of ER-resident proteins and proteostasis as an important viral perturbation strategy.

Figure 3: Functional validation of USP19, hnRNP-U and WNK kinases as viral targets.
figure 3

a, 293T cells transfected with GFP fused to a proteasomal degradation signal (Ub-R–GFP), Myc-tagged USP19 and NS5A. MG132 was added for 6 h, and cells were analysed by immunoblotting (IB). b, Huh7.5 cells expressing firefly luciferase were infected with wild-type (WT) HCV or HCV lacking the USP19 interaction site (Δ2354–2404). Results are activities after 96 h (means ± s.d. for three independent experiments). Asterisk, P < 0.025, Student’s t-test. MOI, multiplicity of infection. c, FluAV minireplicon activity in the presence of 33 ng of Myc–hnRNP-U or GRB2 (control) and 33 ng of NS1 (A/PR/8/34). Results are FluAV polymerase (Pol) activity (means ± s.d. for duplicate measurements, one representative of three). Immunoblots show protein expression (24 h). d, Co-immunoprecipitation of Flag–hnRNP-U and indicated mutants with HA–NS1 (A/PR/8/34). e, As in c, but for 100 ng of GFP (control) and Flag–hnRNP-U mutants (means ± s.d. for duplicate measurements, one representative of three). f, g, NFκB-luciferase (luc) activity in HEK293 cells in the presence of increasing amounts (in ng) of WNK1 and WNK3 with or without IL-1 (f), and K7 (g). Results are measurements after 24 h (means ± s.d. for triplicate experiments). h, HeLa cells transfected with siRNAs against WNKs and non-silencing control were infected. TCID50, 50% tissue culture infective dose (mean ± s.d., n = 3).

PowerPoint slide

The heterogeneous ribonucleoprotein hnRNP-U was among the most frequently targeted cellular proteins in the analysis (Supplementary Figs 11 and 12a and Supplementary Table 6) and has previously been reported to restrict growth of HIV25. Overexpression of hnRNP-U inhibited the polymerase activity of FluAV and the growth of vesicular stomatitis virus (VSV) (Supplementary Fig. 12b and data not shown). This inhibitory effect was alleviated by coexpression of NS1 (FluAV), establishing a functional link to hnRNP-U (Fig. 3c). We mapped the NS1 interaction site on hnRNP-U to the carboxy-terminal Arg-Gly-Gly (RGG) domain (Fig. 3d and Supplementary Fig. 12c)26. The RGG domain bound viral RNA in infected cells (Supplementary Fig. 12d), and an hnRNP-U mutant lacking this domain was defective in antiviral polymerase inhibition (Fig. 3e), suggesting that hnRNP-U inhibits the replication of RNA-viruses through viral RNA interaction. Collectively, the analysis highlights hnRNP-U as an important antiviral protein and a hotspot of viral perturbation strategies.

Of the 70 viORFs used in the study, only K7 of VACV27 interacted with members of the WNK family (Supplementary Figs 11 and 13a–e and Supplementary Table 6), which are regulators of ion transport and are implicated in cancer28. Subsequent analyses on the potential role of this protein family in the antiviral immune response revealed that WNK1 and WNK3, but not WNK2 or WNK4, synergized with interleukin-1 (IL-1)-stimulated activation of the p38 kinase (Supplementary Fig. 13f), and activated a NF-κB reporter construct alone or in combination with IL-1 (Fig. 3f), which was inhibited by coexpression of K7 (Fig. 3g). Expression of WNK3 stimulated IL-8 production alone or in combination with IL-1 (Supplementary Fig. 13g). Short interfering RNA (siRNA)-mediated knockdown of various WNK family members resulted in increased growth of VSV (Fig. 3h and Supplementary Fig. 13h). These results illustrate the value of our proteomics data set by revealing a previously unknown role for WNK kinases in the antiviral immune response.

Proteomic profiling of such a large group of viral regulators of cell function offers the opportunity to explore kinship in their mode of action and, by inference, the perturbation strategy of the viruses that encode them. We defined a notion of kinship distance by incorporating shared targets, proximity in the humPPI of non-shared targets, and their strength of interactions. viORFs from the same viral family had short average kinship distances (Supplementary Fig. 14), consistent with their evolutionary relationship. Notable exceptions were viORFs from paramyxoviruses, which had an average distance even larger than randomized viral target profiles, possibly reflecting a particularly pleiotropic mechanism of action. We generated a dendrogram that showed the kinship distance of the individual viORFs as a proxy for the perturbation strategy of the cognate virus (Fig. 4). Roughly half of the viORFs clustered in a central, rather dense part of the tree, reflecting overlapping strategies, whereas the other half was more distant, probably indicating more unique targeting strategies. Many clusters represented viORFs from evolutionarily related viruses, which are more likely to exercise comparable perturbation strategies. For example, most influenza A virus NS1 proteins and all NSs proteins from bunyaviruses clustered together. A few viORFs did not cluster according to their genome group, which was evocative of some degree of evolutionary convergence with the proteins of other viruses on shared pathways, or more distinctive routes of action, possibly as part of a combined attack with another ORF of the same virus. This is best illustrated by the five viORFs from VACV, which were found scattered in the tree and were likely to have evolved to fulfil specific, complementary functions.

Figure 4: Similarities of viORF actions.
figure 4

Dendrogram of viORF relationships based on the kinship distance, which integrates the number of shared targets and the network distance in the humPPI of the distinct targets. The virus genotype that the individual viORF derives from is shown in a colour code in the circle around the dendrogram. EBOV, Ebola virus; hCMV, human cytomegalovirus; HCV, hepatitis C virus; HeV, Hendra virus; HSV, herpes simplex virus; HSV1, herpes simplex virus 1; KSHV, Kaposi’s sarcoma-associated herpesvirus; LaCV, La Crosse virus; LCMV, lymphochoriomeningitis virus; MARV, Marburg virus; MCMV, murine cytomegalovirus; MeV, measles virus; NDV, Newcastle disease virus; NiV, Nipah virus; PIV2, parainfluenza virus 2; ReoV, reovirus; RotaV, rotavirus; SFSV, sandfly fever sicilian virus. viORFs from VACV are indicated with a star.

PowerPoint slide

Our results demonstrate that viruses have evolved to exploit a variety of cellular mechanisms, and suggest that the host cell relies on the proper homeostatic regulation across these diverse cellular processes to detect, alert to and counteract pathogen interference. In addition, the study provides a rationale for considering or excluding the targeting of specific intracellular pathways for pan-viral or virus-specific antiviral therapy.

Methods Summary

Complementary DNA of tandem affinity-tagged viORFs was amplified by polymerase chain reaction and cloned into the pTO-SII-HA-GW vector by using Gateway recombination (Invitrogen). The resulting plasmids were used to generate hygromycin-selected stable isogenic HEK293 Flp-In TREx cell lines, and viORF expression was stimulated by doxycycline12. Protein complexes isolated by tandem affinity purification using Strep-II and haemagglutinin (HA)-affinity reagents were analysed by LC–MS/MS with an LTQ Orbitrap XL, an LTQ Orbitrap Velos or a QTOF mass spectrometer. The data were searched against the human SwissProt protein database, using Phenyx and Mascot. The humPPI was generated using public interaction databases. Recombinant HCVs (strain JC1) with mutations in domain III of NS5A were generated by transfecting full-length genomic RNA with targeted deletions in the NS5A region. Subcellular localization of proteins was performed on a Leica SP2 confocal microscope. The influenza virus replicon assay was performed as described previously12.

Online Methods

Plasmids, viruses and reagents

Expression constructs were generated by PCR amplification of viORFs followed by Gateway cloning (Invitrogen) into the plasmids pCS2-6myc-GW, pCMV-HA-GW and pTO-SII-HA-GW. pCAGS-Flag-hnRNP-U and mutants thereof were provided by S. Nakagawa. Ub-R–GFP and Myc–USP19 were published previously22. pHA-PIK3R2 was from Oliver Hantschel. GFP–NS5A domain mutants were published previously30. Recombinant HCV variants with mutations in domain III of NS5A were generated by replacing the NS5A fragment in pFK-Jc1-NS5A-HA, containing the full-length HCV chimaeric Jc1 genome31 in which a HA tag is inserted in frame within NS5A and in pFK-JcR-2a containing Renilla luciferase fused amino-terminally with the 16 N-terminal amino-acid residues of the core protein and C-terminally with the foot-and-mouth disease 2A peptide coding region, enabling direct quantification of viral replication by measuring Renilla luciferase activity32. All viruses were produced by transient transfection of Huh7.5 cells with RNA transcribed in vitro. Recombinant RVFV (Rift valley fever virus)33 expressing tandem affinity-tagged (GS-TAG) versions of NSs proteins were generated by replacing the RVFV NSs open reading frame with GS-tagged versions of NSs that were generated by PCR amplification. The FluAV minireplicon system to measure FluAV polymerase activity34, IFN-β–luciferase, NF-κB-luciferase and the Renilla luciferase control plasmid (pRL-TK; Promega) were described previously35.

Streptavidin beads were from IBA (Strep-Tactin agarose); HA–agarose (clone HA7) was from Sigma. Antibody against β-tubulin (anti-β-tubulin; clone DM1A) was from Abcam, anti-β-actin (catalogue number AAN01) was from Cytosceleton. IRDye-conjugated anti-c-Myc (catalogue number 600-432-381) and anti-rabbit (catalogue number 611-732-127) secondary reagents were from Rockland. Alexa Fluor 680-conjugated goat anti-mouse (catalogue number 10524963) were from Molecular Probes. Reagents for quantitative RT–PCR were from Qiagen. Poly(dA)•poly(dT) were from Sigma and transfected with Lipofectamine 2000 (Invitrogen) or Polyfect (Qiagen). Stimulatory PPP-RNA was described previously12. MG132 was from Sigma. IFN-β and IFN-α2a were from PBL Interferonsource. Tumour necrosis factor-α and IL-1β were from Pierce. IL-8 was measured by enzyme-linked immunosorbent assay (BD). Lymphochoriomeningitis virus (Armstrong strain), FluAV (A/PR/8/34), VSV (Indiana strain) and VSV-M2 (mutant VSV with M51R substitution of the matrix protein, leading to IFN-α/β induction; originally called AV3) have been described previously12. Virus titres were measured by determining the half-maximal infectious dose (TCID50) on Vero cells, or on Huh7.5 cells for HCV.

Cells, co-immunoprecipitations and imaging

HEK293 Flp-In TREx cells that allow doxycycline-dependent transgene expression were from Invitrogen. HEK293, 293T, HeLa S3 (ref. 12), Lunet, Lunet-Neo-sgNS5A(RFP), Huh7/5.2 and Huh7.5 cells have been described previously30. Highly permissive Huh7.5 or Huh7.5 FLuc, stably expressing firefly luciferase introduced by lentiviral transduction32, were used for HCV infection experiments. Fibroblasts were kept in DMEM medium (PAA Laboratories) supplemented with 10% (v/v) FCS (Invitrogen) and antibiotics (100 U ml−1 penicillin and 100 μg ml−1 streptomycin). For inducible transgene expression, HEK293 Flp-In TREx cells were treated for 24–48 h with doxycycline (1 μg ml−1), depending on cellular density to just about reach confluence. For siRNA-mediated knockdown, if not stated otherwise in figure legends, 5 nmol of siRNA pool (Supplementary Table 7) was mixed with HiPerfect (Qiagen) and added to 105 HeLa S3 cells. After 48 h, cells were used for experiments. For co-immunoprecipitations 293T cells were transfected with expression plasmids for 24–48 h and lysates were used for affinity purification in TAP buffer12 using anti-HA–agarose or anti-c-Myc-coated beads. For protein detection in western blot analysis a Li-Cor infrared imager was used. Confocal images were acquired with a Leica SP2 confocal microscope.

Affinity purification, mass spectrometry and transcriptome analysis

HEK293 Flp-In TREx cells and isolation of protein complexes by TAP and peptide analysis by LC–MS/MS have been described previously18. Proteins identified by this method can be found in a complex but do not necessarily bind directly to each other. In brief, five subconfluent 15-cm dishes of cells were stimulated with 1 μg ml−1 doxycycline for 24–48 h. Protein complexes were isolated by TAP using streptavidin agarose followed by elution with biotin, and a second purification step using HA–agarose beads. Proteins were eluted with 100 mM formic acid, neutralized with triethylammonium bicarbonate (TEAB) and digested with trypsin, and the peptides were analysed by LC–MS/MS36. For bunyavirus NSs proteins, recombinant viruses33 containing GS-tagged NSs proteins were generated. Protein complexes were denatured in Laemmli buffer37 and separated by one-dimensional SDS–PAGE; entire lanes were excised and digested in situ with trypsin and the resultant peptides were analysed by LC–MS/MS. Mass spectrometric analysis was performed for gel-free and gel-based samples, respectively, on a hybrid LTQ Orbitrap XL, an LTQ Orbitrap Velos mass spectrometer (both from ThermoFisher Scientific) or on a quadrupole time-of-flight mass spectrometer (QTOF Premier; Waters) coupled to an 1100/1200 series high-performance liquid chromatography system (Agilent Technologies). Data generated by LC–MS/MS were searched against the human SwissProt protein database (v. 2010.09, plus appended viral bait proteins) with Mascot (v. 2.3.02) and Phenyx (v. 2.6). One missed tryptic cleavage site was allowed. Carbamidomethyl cysteine was set as a fixed modification, and oxidized methionine was set as a variable modification. A false-positive detection rate of less than 1% on the protein groups was imposed (Phenyx z-score more than 4.75 for single peptide identifications, z-score more than 4.2 for multiple peptide identifications; Mascot single peptide identifications ion score more than 40, multiple peptide identifications ion score more than 14).

To measure gene expression, Huh7/5-2 cells were left uninfected or infected with HCV (strain JC1) at a MOI of 5, and RNA was isolated using Trizol (Invitrogen) after 4, 12, 24, 48 and 72 h. Gene expression analysis was performed in duplicate using an Affymetrix platform (Affymetrix Human Genome U133A 2.0 Array).

Bioinformatic analysis

Data filtering. All proteins identified in the GFP negative controls (51 proteins) were removed.

Data normalization. Affinity-purification MS experiments were performed with two biological replicates and two technical replicates for each; that is, four replicates. We first normalized individual replicates according to the NSAF procedure29. The replicates of each viORF normalized data element were then assembled in a table with 0 for missing detection, and each viral target was assigned the average NSAF value across the replicates. On the basis of a robust estimate (MAD) of the coefficient of variation (Supplementary Fig. 15a) we further penalized highly variable targets by applying a reduction factor between 1 (modest variability) and 0.5 (high variability) (Supplementary Fig. 15b). Direct normalization through a division by the standard deviation was excluded because of the limited number of replicates available. For a given viORF v and a viral target p, the weight given to the interaction vp was hence computed as

where i accounts for the replicates. The distribution of strength values is shown in Supplementary Fig. 15c.

Human interactome. We integrated human physical protein–protein interactions (humPPI) obtained from public databases (IntAct, BioGRID, MINT, HPRD and InnateDB19) and thereby obtained an interactome (largest connected component) comprising 13,350 proteins and 90,292 interactions.

Human central proteome. A list of commonly expressed human proteins was assembled by merging a previous study20 with mapped (orthologues) mouse proteins found in the intersection of six mouse tissues38 and genes expressed in all except four or fewer tissues from SymAtlas. The resulting list included 4,276 proteins and is provided as Supplementary Table 8.

Network topological measures. We retained two classical measures: the connectivity (degree)—that is, the number of interactions of one protein in the PPI—and the relative betweenness centrality, which is equal to the relative number of shortest paths between any two proteins that go through a given protein.

MS-weighted measures. To compute a weighted characteristic of the targeted host proteins, for example connectivity in the human PPI, of one viral modulator vm we used

where T(vm) is the set of all human proteins targeted by vm; αp were proportional to the estimated interaction strength, and sum to 1. When the same viral modulator was considered in several viruses (for example NS1 of FluAV), we computed the weights for each interacting protein taking the maximum of the strengths found in different viruses to avoid any bias by over-represented viral modulators; that is, αp maxvNS1_virusesstrengthv,p. Null distributions were generated by assigning actual weights to random proteins 10,000 times, thereby obtaining a histogram of 10,000 random weighted characteristics, which was fitted with a gamma distribution to estimate P values (Supplementary Fig. 15d).

Weighted functional annotation analysis. We performed GO and KEGG pathways analysis integrating the interaction strengths of viORF targets by summing all the above normalized (sum equal to 1) αp weights found in a GO term or a pathway to obtain a score. This score was then compared with a null distribution modelled by a gamma fit on 1,000 random scores to estimate a P value. Random scores were obtained by assigning the weights to random proteins and summing those that fell in the GO term or pathway.

Perturbation map and relative position along a pathway. These two computations were performed in accordance with published methods20. Pathways were taken from NCI-PID39, and the perturbation map algorithm (GO fluxes in ref. 20) was modified to use the interaction strengths between viORFs and their targets as weights in scoring interaction between GO terms instead of constant weights. For simplification, GO terms were reduced to 14 categories (Supplementary Table 9). Perturbation map null distributions were obtained with 250 randomized annotated networks that respected the original network connectivity distribution and GO term frequencies.

Distance of viORFs. Given two viORFs x and y, the distance d(x,y) is defined as follows. Let S be the union of all x and y targets, Dx the targets unique to x, and Dy those unique to y. A preliminary distance c is computed by summing all the human interactome shortest path distances from individual targets in Dx and Dy with the targets unique to the other viORF, considering interaction strengths to penalize differences on strong different targets and minimize the impact of weaker distinct targets. Thus,

Finally, c is normalized to take into account the number of distinct targets compared with the total number of targets: d(x,y) = c(|DxDy|)/|s|, where |...| denotes set cardinality—that is, the number of elements.

The random distance distributions were obtained as follows: for each viORF, its targets were replaced by a random selection of the same number of proteins from the humPPI such that the same pairs of (random) distances could be computed. The overall procedure was repeated 100 times and in the case of the HEK293 selection the human proteins randomly chosen were restricted to the humPPI and to proteins identified by mass spectrometric analysis of the HEK293 proteome20.