Cellular networks are complex systems that are perturbed in cancer. Therefore, integrating mutation data with robust gene interaction networks is a key step toward modeling the faulty logic at work in cancer cells.

Main

Cellular networks consist of genes that mutually regulate one another. Consequently, genomic alterations in one gene may propagate to disrupt functions in a diverse array of molecular processes encoded by the system.

Networks can therefore help in linking together seemingly very different events, just as a circuit diagram allows the understanding of how either of two light switches, when flipped on, can turn on a lamp at the other side of the room. Two patients with the same cancer subtype may carry completely different mutations, but knowing the way gene switching works can identify the underlying causal mechanism that leads to the same disease in both individuals.

In addition to being databased in a generic way, findings of cancer-specific genomic alterations can also be leveraged to gain an understanding of the context in which recurrent mutations exert their effects. This can be achieved by linking mutational data into databases that catalog known functional relationships among gene products, for example, in protein-protein interactions, transcriptional regulation and pathway-based relationships.

However, current representations of the functional interactions governing cellular networks are fraught with false positives and false negatives and often fail to account for experimental systems, cellular contexts and repeatability of the observations. Therefore, algorithms that incorporate network information must assess the robustness of solutions and be designed to perform well in the face of uncertainty. Identifying and integrating explicit gene interaction networks is a key step toward building computational models that represent the perturbed gene regulatory systems of cancer cells with high fidelity.

Propagating effects of mutations on a gene-gene interaction network

In Hofree et al. (Nat. Methods doi:10.1038/nmeth.2651), network integration was achieved by combining new data with previously constructed gene-gene interaction networks. Integration of the two data sets was achieved by coupling genes (on the basis of an associated confidence statistic) if both had previously been reported in the literature to participate in the same biological process, for example, in protein-protein interaction, biochemical reaction or shared signaling logic.

Network-based stratification of tumor mutations

Matan Hofree, John Shen, Hannah Carter, Andrew Gross & Trey Ideker Nature Methods 10.1038/nmeth.2651

For each patient independently we project the mutation profiles onto a human gene interaction network obtained from public databases28, 29, 30. Next, the technique of network propagation31 is applied to spread the influence of each mutation profile over its network neighborhood (Fig. 1b). The result is a 'network-smoothed' profile in which the state of each gene is no longer binary but reflects its network proximity to the mutated genes in that patient along a continuous range [0, 1]. Following this 'network smoothing', patient profiles are clustered into a predefined number of subtypes (k = 2, 3, ... 12) using the unsupervised technique of non-negative matrix factorization32 (NMF; Fig. 1c). [...] Finally, to promote robust cluster assignments, we use the technique of consensus clustering33, in which the above procedure is repeated for 1,000 different subsamples in which subsets of 80% of patients and genes are drawn randomly without replacement from the entire data set. The results of all 1,000 runs are aggregated into a (patient × patient) co-occurrence matrix, which summarizes the frequency with which each pair of patients has cosegregated into the same cluster. This co-occurrence matrix is then clustered a second time to recover a final stratification of the patients into clusters/subtypes (Fig. 1d).

Determining the effects of the altered cancer genome on protein phosphorylation signaling

Phosphorylation of protein substrates by protein kinase enzymes is a central signal transduction mechanism used to regulate many cancer-relevant processes. In Reimand et al. (Sci. Rep. doi:10.1038/srep02651), numerous mutations were found to eliminate phosphorylation sites and, thus, abolish phosphorylation signaling. Additional somatic mutations were found to affect residues adjacent to phosphorylation sites and also caused rewiring of kinase signaling networks. By studying kinase target sequences in detail, the authors made predictions about mutations leading to oncogenic gain or loss of signaling in kinase-based signal transduction networks. Recurrent mutations and signaling pathway enrichments within the network in cancer were also analyzed, with mutations that affected signaling found in known cancer genes as well as in novel candidate cancer genes and pathways.

The mutational landscape of phosphorylation signaling in cancer

Jüri Reimand, Omar Wagih & Gary Bader Scientific Reports 10.1038/srep02651

To investigate cancer mutations in phosphorylation signaling, we collected 87,060 experimentally determined phosphosites in 10,185 human proteins and integrated these with 241,701 missense single nucleotide variants from the TCGA pan-cancer project. Including ±7 residues of phosphosite flanking regions and covering 7% of protein sequence, we found 16,840 phosphorylation-related SNVs (pSNVs) in 5,859 genes and 89% of all samples—over 17 times more pSNVs than previously discovered7. According to another measure of pSNV importance, 1,427 direct pSNVs replace the central phosphorylated residue and thus disrupt phosphorylation; such mutations are under-represented on the whole, although frequently seen in known cancer genes such as TP53 and CTNNB1 (79 cancer genes, p = 4.2e - 18, Fisher's exact test). In total, we predict a specific signaling mechanism for 29% of pSNVs (4,800) through either direct pSNVs or kinase network rewiring.

A high-confidence network includes 392 pSNVs in 534 interactions, and comprises only top-scoring kinase binding sites for signaling gain and experimentally determined kinase-substrate signaling loss (Fig. 3a).

Mapping cancer drivers to a network of functional gene interactions

Comprehensive identification of mutational cancer driver genes across 12 tumor types

David Tamborero et al.Scientific Reports 10.1038/srep02650

We provide a list of 291 high confidence cancer driver genes (HCDs) acting on 3,205 tumors from 12 different cancer types by combining the results of four computational approaches [...] designed to find signals in genes reflecting the positive selection on cells during tumor emergence and evolution5–9.

When HCDs are mapped to a functional interaction network (see Methods), they appear enriched for biological processes within 5 broad modules—Chromatin remodeling, mRNA processing, Cell signaling/proliferation, Cell adhesion, DNA repair/Cell cycle—which loosely correspond to both established and emergent cancer hallmarks (Fig. 3).

These novel driver candidates appear alongside other well-established cancer genes. One may thus hypothesize that as more tumor genomes are sequenced, new lowly recurrent mutational drivers in these modules will emerge. This idea is further illustrated in Figure 4a, where, for example, well-known cancer genes within the cell cycle pathway are schematically represented together with poorly established HCDs. Examples of novel cell cycle driver candidates include ATR, a kinase which phosphorylates p53 and other proteins, [and] PIK3CG and PIK3CB, within the PIK3-AKT signaling pathway, which appear to complement the tumorigenic role of PIK3CA.

Effects of cancer mutations on regulatory RNA networks

MicroRNAs (miRNAs) modulate transcript (mRNA) stability and translation by binding to complementary sites on transcripts. miRNAs and their mRNA targets form a many-to-many network. Thus, genomic mutations that perturb components of miRNA-mRNA networks may cause complex disruptions in cellular regulatory networks.

Analysis of microRNA-target interactions across diverse cancer types

Anders Jacobsen et al.Nature Structural & Molecular Biology 10.1038/nsmb.2678

Recurrence of target associations across cancer types

To explore the hypothesis that individual miRNA-target relationships are active in multiple cancer types and may regulate common cancer traits, we developed a method and rank-based statistical score, the REC score. The method ranks miRNA-mRNA expression associations in the context of miRNA and cancer type and evaluates the null hypothesis that no association exists between the miRNA-mRNA pair in all cancer types (Fig. 1 and Online Methods).

Global analysis of interactions using public data sets

To further analyze whether recurrent pan-cancer miRNA-mRNA associations capture miRNA regulatory relationships, we evaluated the extent to which the REC score could predict mRNA expression changes induced by experimental perturbation of miRNAs in vitro.

In all analyzed miRNA perturbation experiments, we found that these REC target mRNAs were significantly downregulated or upregulated after miRNA overexpression or inhibition, respectively (Fig. 3, range of P values: 0.06–1.9 × 10-13, one-tailed Wilcoxon's rank-sum test, 7 < n < 179), consistent with the hypothesis that the recurrent pan-cancer miRNA-mRNA associations capture miRNA regulatory relationships.

These 143 putative recurring target interactions formed a network of 40 evolutionarily conserved miRNAs and 72 target mRNAs (Fig. 4b).

Testing the robustness of network assignment

Classification of patterns of mutation and structural rearrangement into coherent modules that are significantly different from one another is sensitive to the input data used. In Ciriello et al. (Nat. Genet. doi:10.1038/ng.2762), a number of controls were performed to test the robustness of modular classification.

Emerging landscape of oncogenic signatures across human cancers

Giovanni Ciriello et al.Nature Genetics 10.1038/ng.2762

Validation of the modularity optimization method

We tested our approach on two well-characterized data sets frequently used as benchmarks for network modularity detection. The first network is known as the Southern Women Event Participation network33. It represents women's attendance of social events in the Deep South, using data collected by Davis and colleagues in the 1930s to study social stratification. For this network, our approach was able to identify the two-module structure of the network (Supplementary Fig. 12) that coincides with the solution proposed by Guimerá and colleagues34 and, except for one woman, with the subjective solution proposed by the ethnographers that conducted the study.

Robustness of the subclasses.

The robustness of the subclasses was assessed by removal of different percentages of samples and reclassification of the reduced data sets. During each run, hierarchical stratification obtained with the reduced data set was mapped to the original one by mapping each module from the reduced classification to the module from the original classification that maximizes the overlap.

We found that sample assignment to each subclass was robust in that it varied little upon systematic subsampling (Supplementary Fig. 6).

Validation of the results in independent data sets

Emerging landscape of oncogenic signatures across human cancers

Giovanni Ciriello et al.Nature Genetics 10.1038/ng.2762

Closer inspection of the distribution of selected functional events showed a striking inverse relationship between copy number alterations and somatic mutations at the extremes of genomic instability, particularly in highly altered tumors (Fig. 2c). Such tumors had either a large number of somatic mutations or a large number of copy number alterations, never both. We refer to this trend as the cancer genome hyperbola.

Tumors in the C class and M class were positioned along the two axes of this hyperbola. Whereas individual tumor types (defined by tissue of origin) had varying proportions of copy number alterations and mutations, none had high numbers of both.

We verified this approximately inverse relationship by adding 907 tumor samples from 6 additional tumor types to the pan-cancer set of 3,299 samples (Supplementary Fig. 4). In this larger data set, we also identified two major classes, one primarily dominated by mutations and the other primarily dominated by copy number alterations, with a remarkably similar set of characteristic functional events.

figure 1

(b) Example illustrating smoothing of patient somatic mutation profiles over a molecular interaction network. Mutated genes are shown in yellow (patient 1) and blue (patient 2) in the context of a gene interaction network. Following smoothing, the mutational activity of a gene is a continuous value reflected in the intensity of yellow or blue; genes with high scores in both patients appear in green (dashed oval). (c) Clustering mutation profiles using non-negative matrix factorization (NMF) regularized by a network. The input data matrix (F) is decomposed into the product of two matrices: one of subtype prototypes (W) and the other of assignments of each mutation profile to the prototypes (H). The decomposition attempts to minimize the objective function shown, which includes a network influence constraint L on the subtype prototypes. k, predefined number of subtypes. (d) The final tumor subtypes are obtained from the consensus (majority) assignments of each tumor after 1,000 applications of the procedures in b and c to samples of the original data set. A darker blue color in the matrix coincides with higher co-clustering for pairs of patients.

figure 2

a. The high-confidence network of gain-of-signalling (green) and loss-of-signaling events (red) induced by pSNVs was predicted for 96 kinase binding models (p < 0.05). Node size corresponds to number of network-rewiring pSNVs, and edge weight shows predicted impact of pSNV to kinase binding. Known cancer genes are shown as blue nodes and kinases are underlined.

figure 3

A) Network representation of HCDs. Trimmed version of the functional interaction network integrated by 124 HCDs that either map to the five broad biological modules enriched among HCDs or connect them. Genes annotated in the CGC are represented as round squares, HCDs not in CGC are represented as circles and non-HCDs used as linkers between HCDs as diamonds. Circles with thicker border are 'novel' candidate drivers discussed in supplementary Table 4 and shown in Figure 3. Genes with a clear preference for bearing PAMs in one tumor type (Fisher's odds ratio > 25) are colored following the project code shown in the figure legend. Colored shadows encircle genes within five enriched biological modules. B) Frequency of PAMs observed HCDs in panel A across samples of each cancer type, following the tumor type color code. The annotations below indicate methods that identify each gene signals of positive selection. Genes with clear preference for bearing PAMs in one tumor type are indicated with a colored square below the histogram, using the tumor type color code. 'Novel' driver candidates are shown in bold font.

figure 4

(b) Inferred pan-cancer network comprising 143 putative target interactions between 40 evolutionarily conserved miRNAs and 72 target mRNAs. Edge width represents strength of the REC score for a given miRNA-mRNA pair, and miRNAs are color coded by seed family relationships (singletons in white).

figure 5

We identified two main classes (blue and red) and within each main class we identified smaller groups of women, who attended the same set of events. Grouping by dotted lines.

figure 6

Subclasses at all levels of the hierarchical stratification were separately tested for robustness to random sample removal. The top classes M and C (black dots) show almost perfect reproducibility upon removal of 5%, 20%, and 50% of the samples. Similarly, the M subclasses (green dots) maintain a constantly high robustness descending the classification. C subclasses (red dots) at the bottom the classification are the most affected by sample removals, despite a robustness significantly higher than expected (gray dots). The non-focal nature of copy number alterations and the difficulty in identifying the corresponding functional targets are likely to impact the definition of these groups, weakening their robustness.

figure 7

(c) The distribution of SFEs in tumors indicates that the number of copy number alterations in a sample (x axis) is approximately anticorrelated with the number of somatic mutations in a sample (y axis). The number of samples for a given (x,y) position range from 0 (white) to 243 (dark blue). CNAs, copy number alterations.

figure 8

b, Samples from the resulting set of 18 tumor types (PANCAN18) confirmed the inverse relationships between copy number alterations and mutations, with the newly added samples following the same inverse trend (red squares) we originally observed (gray squares). [...] d, The ranked list of SFEs, with the first of the list being the most enriched in the M-class and the last the most enriched in C-class, is almost perfectly concordant with the same ranked list obtained in the PANCAN12 dataset (Kendall correlation coefficient tau = 0.9).