Review Article | Published:

Functional variomics and network perturbation: connecting genotype to phenotype in cancer

Nature Reviews Genetics volume 18, pages 395410 (2017) | Download Citation


Proteins interact with other macromolecules in complex cellular networks for signal transduction and biological function. In cancer, genetic aberrations have been traditionally thought to disrupt the entire gene function. It has been increasingly appreciated that each mutation of a gene could have a subtle but unique effect on protein function or network rewiring, contributing to diverse phenotypic consequences across cancer patient populations. In this Review, we discuss the current understanding of cancer genetic variants, including the broad spectrum of mutation classes and the wide range of mechanistic effects on gene function in the context of signalling networks. We highlight recent advances in computational and experimental strategies to study the diverse functional and phenotypic consequences of mutations at the base-pair resolution. Such information is crucial to understanding the complex pleiotropic effect of cancer genes and provides a possible link between genotype and phenotype in cancer.

Key points

  • Current cancer therapies targeted to particular genetic lesions are primarily hampered by the extreme genetic heterogeneity observed across patient populations.

  • Cancer genomic variants and regulatory molecules interact with each other in cellular networks. Network biology has recently emerged as a systems-level approach to stratifying mutations that give rise to markedly different phenotypes.

  • Different cancer mutations often lead to distinct perturbations in signal transduction networks.

  • Several computational tools have been developed to analyse functional effects of cancer mutations and to prioritize drivers of oncogenesis.

  • Various experimental strategies have emerged to study mutation-specific functional effects. Such functional variomics approaches can dissect cancer variants at high resolution.

  • Integration of computational predictions with systems biology experimental approaches will be crucial for interpreting complex genotype-to-phenotype relationships in human disease including cancer. Together this effort represents a critical step towards precision medicine.


With rapidly evolving next-generation sequencing technologies, there has been an explosion in human genotypic information, particularly for disease mutations associated with numerous types of cancer1,2 (Fig. 1a). However, the functional paths by which heterogeneous genotypic variants lead to diverse phenotypic consequences remain largely unresolved3,4. More importantly, how the multiple genomic aberrations present in a single tumour integrate into the tumour phenotype — including response to therapy and patient outcomes — remains an overarching gap in knowledge in the field5,6. It is therefore crucial to identify the functional roles of distinct cancer mutations. A 'one gene, one function, one disease' model cannot be reconciled with the complexity that different mutations in the same gene often lead to markedly diverse phenotypes7,8. It is now clear that genes and gene products do not function in isolation but rather interact with each other in cellular networks9,10. A systems-level understanding of how cancer mutations affect signalling networks is pivotal for interpreting the complex genotype-to-phenotype relationships in terms of tumour behaviour and patient outcomes11,12. This more sophisticated functional understanding of mutations is key to distinguish drivers from non-pathogenic passengers, as well as to enhance clinical diagnostics, prognostics and therapeutics13.

Figure 1: Complex genetic heterogeneity in human cancer.
Figure 1

a | Rapid increase in the number of cancer mutations identified over the past decade. Mutations were downloaded from the Catalogue of Somatic Mutations in Cancer (COSMIC) database and PubMed IDs for each mutation were extracted. The publication year for each PubMed ID was obtained by 'RISmed' R package. The number of mutations was plotted as a function of the corresponding publication year. b | Mutations occur in both coding and non-coding regions of cancer genomes. Coding mutations are located in genes that undergo mRNA transcription, which is often mediated by RNA polymerase II (RNAPII), and protein translation, which is mediated by ribosomes. Non-coding aberrations include mutations in cis-regulatory elements and in non-coding RNAs (ncRNAs). Furthermore, mutations can be small-scale point mutations up to larger-scale aberrations such as copy number variation and chromosomal rearrangements (not shown). c | Heterogeneous coding mutations across diverse cancer types. Somatic mutations across 33 types of cancer were obtained from The Cancer Genome Atlas (TCGA) project, comprising 10,489 tumour samples. The number of mutations per sample is plotted as a box plot, with the x-axis corresponding to cancer type and the y-axis representing the log10 number of mutations per sample. Cancer types are ordered from left to right based on tissue origin. ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, cholangiocarcinoma; COAD, colon adenocarcinoma; DLBC, lymphoid neoplasm diffuse large B cell lymphoma; ESCA, oesophageal carcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LAML, acute myeloid leukaemia; LGG, brain lower grade glioma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; MESO, mesothelioma; OV, ovarian serous cystadenocarcinoma; PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and paraganglioma; PRAD, prostate adenocarcinoma; READ, rectum adenocarcinoma; SARC, sarcoma; SKCM, skin cutaneous melanoma; STAD, stomach adenocarcinoma; TGCT, testicular germ cell tumours; THCA, thyroid carcinoma; THYM, thymoma; UCEC, uterine corpus endometrial carcinoma; UCS, uterine carcinosarcoma; UVM, uveal melanoma.

Systematic high-throughput functional 'variomics' platforms to assess mutation-specific network perturbations are beginning to emerge11,12. In this Review, we discuss recent advances in the identification and functional characterization of genomic variants in human cancers, as part of an initiative towards creating a functional landscape of the cancer genomes. Integration of systems genetics with signalling networks will be crucial for prioritizing cancer-causing variants and for uncovering patient mutation-specific disease mechanisms and their resultant therapeutic liabilities4,6,13. Together, this effort represents a crucial step towards personalized medicine.

In this Review, we first provide a brief overview of different classes of genomic aberration that underlie cancer heterogeneity. We then discuss the importance of systems genetics and network biology for understanding the functional effects of various cancer mutations. To distinguish driver mutations from passenger mutations, we describe a toolkit of recent computational and bioinformatics algorithms to prioritize genomic mutations and to interpret their potential functional effect. To reveal the molecular underpinnings driving selection of particular mutations, we further describe emerging systems genetics technologies that enable the functional characterization of cancer variants in a high-throughput manner. Finally, with the integration of computational and experimental platforms, we aim to provide systems-level strategies to stratify cancer mutations at single nucleotide resolution, thereby bridging genotype to phenotype in cancer.

For brevity, we do not discuss genetic interaction networks or metabolic networks (reviewed in Refs 3,14,15), modelling of network dynamics and the analysis of network centrality (reviewed in Refs 3,6,16) or the functional platforms used to assess variants in non-coding regions (reviewed in Ref. 17), as these have been extensively reviewed previously.

Genetic heterogeneity in cancer

Genomic instability propels the accumulation of mutations in cancer cells and results in the rapid evolution of cancer genomes in response to both the microenvironmental stresses that occur during tumour evolution and the stresses induced by tumour therapy. Identifying and characterizing cancer driver mutations is among the most pressing needs for understanding the causality and progression of tumours, as well as for developing effective treatments tailored to specific patients with cancer. The plethora of sequencing data from recent whole-genome and whole-exome sequencing projects has begun to enable the elucidation of the mutational landscape of most common cancers and the functional and structural elements of cancer genomes. In this Review, we refer to 'genetic heterogeneity' primarily as distinct mutations between tumours or across patients.

Somatic versus germline mutations. The majority of cancer mutations are somatic. Approximately 90% of cancer genes show somatic mutations and 20% show germline mutations, with 10% showing both somatic and germline mutations18. Some somatic mutations, such as those in telomerase reverse transcriptase (TERT) and TP53 (which encodes p53), frequently occur across cancer lineages, but others can be more tissue specific. Somatic aberrations show much more diverse patterns compared with germline variants, including complex genomic rearrangements, such as chromoplexy19, chromothripsis20 and kataegis21. This is probably because of much reduced evolutionary constraints in somatic cells, as somatic mutations only need to be compatible with viability of the subset of cells in which they are found, whereas germline-inherited mutations will be present body-wide throughout development. The accumulation of somatic mutations in cancer cells is pivotal in cancer progression. The 'two-hit' hypothesis originally postulated that tumours develop as a consequence of a second somatic mutation occurring upon the first inherited or somatic mutation22. It is now clear that more than two 'hits' are needed for full emergence of the cancer phenotype in most tumours, with statistical models placing this at between two and six driver aberrations. Cancer is commonly viewed as an evolutionary process of genetic instability and natural selection, driven by ongoing accumulation of somatic mutations23,24. The continuous somatic evolution of cancers then contributes to genetic heterogeneity1 and clonal expansion25 during tumour progression.

Coding versus non-coding mutations. Protein-coding regions of the genome constitute the 'exome', which represents <2% of the whole genome, but contains ~85% of known disease-related variants26 (Fig. 1b). However, this is likely to be a technology bias as until recently high-throughput approaches to identify disease-related variants in non-coding regions have been limited. Whole-genome sequencing (WGS) has been applied for deep understanding of non-coding regulation in human cancer. By contrast, whole-exome sequencing (WES) has been widely used and has proved to be a reliable and cost-effective approach to reveal the somatic cancer mutation landscape of protein-coding regions27 (Fig. 1c). Non-coding regions contain functional modules that can elicit marked effects on expression profiles. Such cis-regulatory elements include promoters, enhancers, silencers and insulators (Fig. 1b). Mutations in these elements can drastically alter gene regulation. A potent example is TERT, which is reactivated in 80–90% of human cancers28, often through recurrent mutations in its promoter region29,30. These mutations create de novo binding sites for the GA-binding protein (GABP) transcription factors31,32. Non-coding RNAs (ncRNAs) are also frequent targets of genomic aberrations33,34,35.

Identification of mutations in the coding genome can reveal pathways that underlie cancer progression and can identify therapeutic drug targets. Recently, The Cancer Genome Atlas (TCGA) pan-cancer analysis36 has identified numerous cancer aberrations and has helped to assign the abnormalities into physical complexes and pathways. Indeed, the aggregate of mutations in a complex such as the SWI/SNF complex or in a pathway such as the homologous recombination pathway functions to establish their importance in the tumorigenic process and to further emphasize the need to characterize aberrations not as independent events but rather as part of functional machines.

Driver versus passenger aberrations. Identification of cancer variants has been dominated by genotyping-by-sequencing. Indeed, more than 3 million independent variants in coding regions have been identified by sequencing, with only a small subset of these being functionally annotated. Importantly, not all genetic alterations are relevant to tumour progression. The terms 'driver' and 'passenger' distinguish causal versus random events in cancers37. A driver mutation provides a selective advantage to the tumour clone in its microenvironment at some point during its history, but is not necessarily required to sustain tumour growth throughout the evolution of the tumour. Driver aberrations can be derived from major events, such as chromosomal gains and losses, chromosomal shattering and chromosomal chains38. Alternatively, driver mutations can be derived from mutational hotspots, for example, V600E mutation in BRAF39.

In cancer, there are multiple classes of mutation: missense mutations, frame-shift mutations, silent mutations, nonsense mutations, insertions or deletions, non-coding mutations and others. The relative frequency of the classes of genomic aberration vary markedly across cancer lineages, although overall, missense mutations are by far the most frequent and dominant class of coding aberrations in human cancers (Fig. 2a,b). Kidney clear-cell carcinoma, glioblastoma multiforme, hepatocellular carcinoma, acute myeloid leukaemia, colorectal carcinoma and endometrial carcinoma, with the exception of the serous-like subtype, are dominated by single nucleotide mutations, whereas almost all serous ovarian and breast carcinoma samples, a large fraction of lung and head and neck squamous cell carcinomas are dominated by copy number variations40,41.

Figure 2: Mutational landscape across cancer types.
Figure 2

a | Distribution of different mutation classes across cancer types obtained from The Cancer Genome Atlas (TCGA) project. The fraction of nonsense mutations, silent mutations, frame-shift mutations and missense mutations were plotted for each cancer type. b | Pie charts showing the proportion of different mutation classes from TCGA in all the cancer types (pan-cancer) and in specific cancers, including uterine corpus endometrial carcinoma (UCEC), kidney renal clear cell carcinoma (KIRC) and liver hepatocellular carcinoma (LIHC). c | The Circos plot shows the mutational landscape of the Cancer Gene Census (CGC) genes across major cancer types. Mutations are distributed across diverse locations of the genome. Protein–protein interactions between these genes are depicted in the centre. Blue lines indicate binary direct interactions detected by the high-throughput enhanced yeast two-hybrid (HT-eY2H) system9, and orange lines indicate indirect interactions detected by affinity purification coupled with mass spectrometry (AP–MS)10. Purple lines indicate overlapping interactions detected by both. ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, cholangiocarcinoma; COAD, colon adenocarcinoma; DLBC, lymphoid neoplasm diffuse large B-cell lymphoma; ESCA, oesophageal carcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous carcinoma; KICH, kidney chromophobe; KIRP, kidney renal papillary cell carcinoma; LAML, acute myeloid leukaemia; LGG, brain lower grade glioma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; MESO, mesothelioma; OV, ovarian serous cystadenocarcinoma; PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and paraganglioma; PRAD, prostate adenocarcinoma; READ, rectum adenocarcinoma; SARC, sarcoma; SKCM, skin cutaneous melanoma; STAD, stomach adenocarcinoma; TGCT, testicular germ cell tumours; THCA, thyroid carcinoma; THYM, thymoma; UCS, uterine carcinosarcoma; UVM, uveal melanoma.

The prevalence of random mutations, non-cancer tissue mixed in tumours, clonal heterogeneity and ploidy variation makes it difficult to accurately 'call' mutations irrespective of whether the mutation is a driver or a passenger. There are several strategies to improve mutation calling. First, sufficient sequencing depth is required. WGS is often carried out with 30- to 60-fold coverage, WES is often carried out with 100- to 150-fold coverage and targeted sequencing of candidate gene panels is typically carried out at 200- to 2000-fold coverage38. Second, accurate estimation of the background mutation rate is essential for the discrimination of driver versus passenger mutations, as passenger mutations are expected to be present at similar frequencies to background mutation rates whereas driver mutations are expected to be present at higher frequencies because of the selective advantage they engender. Background mutation rates have traditionally been derived from synonymous mutation rates42 and intronic and untranslated region (UTR) mutations43. Despite lower depth in WGS or WES readouts compared with targeted gene panels, WGS and WES allow background mutation rates to be estimated with higher accuracy. Other genomic parameters have also been taken into account in modern algorithms such as MuSiC44 and MutSigCV45. It is likely that mutation rates are not uniform across genomes46 and that regional estimates of mutation rates that take into account replication timing and other factors are needed.

A cancer network rewiring perspective

Systems genetics: from pathways to cancer networks. Initial studies of the genetic heterogeneity of human cancers originated from pioneering WES projects in breast, pancreatic, colorectal and brain cancers47,48,49,50. These studies showed that cancer genomes are highly complex with 50–100 somatic alterations in each tumour. The Catalogue of Somatic Mutations in Cancer (COSMIC), which catalogues somatic mutation frequencies in tumours and tumour-derived cell lines, as of the latest Release v76 (February 2016), contained more than 3.9 million unique coding mutations in more than 5,000 human genes from more than 1 million tumour samples51. Once identified, cancer genes and mutations need to be assigned to functional and regulatory pathways. Common sets of conserved proteins participate in 'core' functional pathways that mediate essential cellular processes in different cell types and tissues52. However, it remains unclear how these conserved pathways function in different contexts to achieve signalling specificity and how mutations in cancer cells disrupt signalling pathways and cellular functions.

Although distinguishing causal disease variants from non-pathogenic polymorphisms is often insufficient to establish mechanisms or to predict phenotypic outcomes, identifying causal mutations remains a key, but challenging, step for functional interpretation of cancer genomes53. Classic gene knockout or knockdown approaches cannot always resolve the diverse biological functions mediated by different mutations of the same gene54. Therefore, understanding how specific variants affect molecular interaction networks is crucial for interpreting complex genotype-to-phenotype relationships in cancer3,14,54.

Effect of genetic mutations on cancer networks. Protein products of mutated cancer genes do not function in isolation but rather are part of highly interconnected cellular networks3, which are often depicted as nodes (molecules) and edges (interactions)3,6,55 (Fig. 2c). In interaction networks, a genetic mutation can lead to either a complete gene knockout-like behaviour, as loss of all of its interactions in the network or, alternatively, as interaction perturbation ('edgetic'), leading to the loss or gain of specific interactions12 (Fig. 3a). Edgetic mutations tend to be located at interaction interfaces with protein partners. For example, the edgetic mutations R24C and R24H in cyclin-dependent kinase 4 (CDK4), which are associated with melanoma, reside at the protein interaction interface with the partner CDK inhibitor 2C (CDKN2C; also known as p18INK4C) (Fig. 3b). Similarly, the edgetic mutation F194S in fructose bisphosphatase 1 (FBP1) is located at the interaction interface (Fig. 3c).

Figure 3: Effects of cancer variants on molecular interaction networks in cells.
Figure 3

a | Schematic illustration of distinct molecular interaction profiles caused by heterogeneous genetic mutations. Nodes are macromolecules such as proteins, DNAs and RNAs, whereas edges are biophysical interactions between them. Solid lines represent retained interactions and dashed lines represent interactions perturbed by mutations. Cancer-associated mutations can cause a wide range of effects on cellular interaction networks, including loss of all interactions, edgetic perturbation of specific interaction(s) and edgetic gain of interaction(s). b | Locations of residues affected by mutations are highlighted on the cyclin-dependent kinase 4 (CDK4) structure based on homology modelling (Protein data bank (PDB) ID = 1bi7). c | Locations of residues affected by mutations are highlighted on the fructose bisphosphatase 1 (FBP1) structure (PDB ID = 1fpi). CDKN2C, CDK inhibitor 2C.

Prioritization and comprehensive understanding of driver mutations requires the integration of systematic large-scale experimental approaches with computational algorithms. In the next two sections, we cover these aspects in detail.

Computational prediction of drivers

To predict driver mutations, computational systems biology has established and implemented modelling algorithms based on existing big data sets in cancer. In this section, we summarize recent computational methodologies for characterizing the function of cancer mutations from the network perspective and classify them into three main categories: node-level, subnetwork-level and edge-level predictions (Fig. 4; Table 1; see Supplementary information S1 (table)).

Figure 4: Computational tools that prioritize cancer genes and mutations.
Figure 4

Coding genes and non-coding regulatory elements interact with each other in cellular networks (centre). Cancer-associated mutations from The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC) and other sources can be computationally prioritized as driver candidates, on the basis of distinct features, including sequence (top left), structure (top right), regulatory features (middle left) and functional and network features (middle right). Sequence features include conservation, mutation frequency, and so on. Structural features include the three-dimensional structural configuration of proteins, free energy changes, and so on. Regulatory features include cis-regulatory sequences annotated by the Encyclopedia of DNA Elements (ENCODE) and its gene-annotation subproject, GENCODE. Functional and network features include gene ontology, pathway and network centrality measurements. Cancer-causing driver mutations can also be prioritized by their effects on rewiring networks, including protein–protein, transcription factor (TF)–gene and microRNA (miRNA)–gene interaction networks. A list of computational tools are shown in the relevant white boxes. SNV, single nucleotide variant.

Table 1: Computational algorithms to prioritize cancer variants with functional effects

Node-centred effects of cancer mutations

Sequence features. One of the most commonly used methods for estimating the functional effect of mutations is sequence comparison based on multiple alignment. These methods assume that amino acid substitutions in highly conserved positions are deleterious (for example, SIFT56,57 and SIFT4G58). Some algorithms also incorporate sequence-based data with clinical data for inferring the relationships between mutations and the affected genes44. In addition, drivers can be identified by finding genes that harbour significantly more mutations than expected by chance59,60, such as MutSig45 and MSEA61. Together, although these methods are informative in driver prioritization, their accuracy may be influenced by empirically observed local mutation frequencies. The available large-scale WES or WGS data sets from several human genome projects, such as the 100,000 genomes project62 and the Icelandic genome project63, may provide valuable information for building background mutation models. However, although these algorithms have reasonable predictive value for loss-of-function aberrations, they are not very accurate in predicting gain-of-function or edgetic aberrations.

Structural features. Computational tools have been designed by integrating structural features with sequence information (for example, PolyPhen-2 (Ref. 64) and STRUM65). Some methods rely on the evidence that many driver mutations recurrently occur in specific structural regions of proteins (for example, protein domains and disordered regions)2,66,67,68,69 or disrupt the active sites (for example, phosphorylation sites)70,71,72. Recent methods map genetic mutations onto protein three-dimensional structures to evaluate the functional effect of mutations at high resolution73,74,75. In addition, some algorithms integrate multiple evolutionary and structural features to evaluate the disease-causing potential of mutations, such as CanDrA76 and MutationTaster77. Last but not least, other methods prioritize driver mutations on the basis of their location at the structural binding sites for small molecules (for example, CanBind78 and SGDriver79). Notably, most of these approaches predict mutational effects on the function of coding genes.

Regulatory features. The Encyclopedia of DNA Elements (ENCODE) project80 has provided a comprehensive map of regulatory elements by advanced techniques such as chromatin immunoprecipitation followed by sequencing (ChIP-seq), DNase-seq and chromosome conformation capture. Several computational methods to investigate the regulatory effects of cancer mutations have been proposed on the basis of various regulatory features in ENCODE (for example, ANNOVAR81,82). To score the deleterious consequences of genetic variants, some tools integrate a wide range of annotations (including genomic and epigenomic features) into one metric (for example, CADD83, GWAVA84, FitCons85 and deltaSVM86). Although these methods help to prioritize driver mutations, they often neglect the chromatin structural context of these regulatory regions. To overcome this problem, other algorithms were developed for predicting driver variants through the integration of chromatin effects87 and high-dimensional regulatory interactions88. Together, these predictive methods have established a useful toolkit to explore the function of cancer mutations in regulatory regions and their effect on the interactions with target genes.

Significantly mutated subnetworks or pathways

Although the above-mentioned computational methods have reached a reasonable level of accuracy for predicting loss-of-function aberrations, multiple lines of evidence have shown that integration of gene ontology and network features is crucial in providing a holistic measure of the functional consequence of a cancer mutation. Some algorithms assess the functional effect of cancer mutations taking into account the observation that genes with distinct ontology terms possess different baseline tolerance for deleterious mutations89. To further enhance the predictive power, other methods prioritize cancer-causing mutations by integrating sequence and structural features with gene ontology similarity (for example, CanPredict90). Recently, the rapid accumulation of protein interaction network data has provided a new basis for studying the topological features of cancer genes and mutations in cellular networks. It has been shown that cancer genes tend to possess high topological centrality, even higher than that found in essential genes. Several predictive algorithms combine network centrality with large-scale genomic resources for prioritizing variants in cancer (for example, SuSPect91 and FunSeq2 (Ref. 92)). It has been shown that network centrality helps to discriminate between disease-associated and tolerated mutations.

An important observation from analyses of the landscape of cancer mutations is that different tumours or patients with cancer show distinctly different mutational profiles. Furthermore, mutated genes tend to fall into a limited number of recurrently mutated subnetworks or pathways. This observation has stimulated systems-level approaches for detecting possible driver mutations and integrative analyses to identify significantly altered pathways. Using network approaches, several algorithms prioritize mutations in cancer on the basis of their effects on transcriptional output (for example, DIGGIT93) or their links to dysregulated genes from gene expression data (for example, DriverNet94, TieDIE95 and OncoIMPACT96). These algorithms are informative in identifying network modules that are related to downstream transcriptional changes induced by cancer mutations. In addition, some methods identify cancer or subtype-related subnetworks by diffusing cancer mutations throughout a network based on network propagation process (for example, VarWalker97, HotNet98, NBS13 and HotNet2 (Ref. 99)).

Edgetic effects of cancer mutations

In this section, we discuss the computational methods to functionally characterize the edgetic effects of cancer mutations, especially in the context of protein–protein, transcription factor–gene and microRNA (miRNA)–gene interaction networks.

Protein–protein interaction context. The prediction of the effect of a cancer mutation on protein–protein binding can be used to identify driver mutations. Some methods predict deleterious mutations on the basis of the mutation-induced changes in binding free energy (BeAtMuSiC100) or in their 3D protein complex context (Structure-PPi101). Other methods also evaluate the effects of mutations on protein interactions on the basis of force fields and statistical potentials and fast side-chain optimization algorithms (for example, MutaBind102). More globally, several algorithms (for example, dSysMap103) predict drivers by mapping missense mutations onto the structurally annotated human interactome from Interactome3D104, which is a valuable resource for exploring the edgetic role of disease mutations. A disadvantage of these methods is that they rely on known three-dimensional structures that are only available for a small proportion of proteins. By design, most studies of edgetic mutations have focused on loss of interaction; however, it is likely that cancer mutations could also result in a gain of interaction105. These edgetic effects are as yet poorly predicted by current algorithms and require both development of new algorithms and iterative improvement of these algorithms using experimental data71,72.

Gene regulatory context. Edgetic modelling of cancer mutations is not limited to protein interactions but can be applied to transcription regulatory networks in different contexts106,107 and in non-coding regions11,12. Many mutations identified by genome-wide association studies (GWAS) are likely to be regulatory single nucleotide polymorphisms (SNPs) that affect the ability of a transcription factor to bind to DNA. On the basis of this hypothesis, some algorithms score mutant alleles with a position weight matrix (PWM) to detect disruptive transcription factor mutations (for example, is-rSNP108). Additional tools annotate cancer drivers by calculating the change of a binding site caused by genetic mutations (for example, HaploReg109 and OncoCis110). Moreover, biophysical modelling of protein–DNA interactions helps to predict SNPs that cause considerable changes in the binding affinity of transcription factors (for example, BayesPI-BAR111). Finally, the complex miRNA–gene regulatory networks have been shown to control many key cellular processes that are dysregulated in cancers112,113. Several databases have been constructed for compiling mutations that are predicted to perturb miRNA-mediated gene regulation, such as Patrocles114, SomamiR115 and PolymiRTS116,117,118. Together, these available tools are invaluable in predicting a large number of cancer-associated regulatory mutations in signalling networks.

Systems-level experimental platforms

Functional analysis of cancer genes and mutations is key to understanding tumorigenic mechanisms and to developing therapeutic methods. Advances in large-scale experimental platforms and screens have revolutionized our ability to study cancer mutations and have begun to reveal the functional networks of cancer mutations. In this section, we focus on recent advances in functional characterization of coding mutations on a large scale.

Transcriptome profiles altered by mutations. Transcriptome profiling has been extensively used in the past decade for functional genomics40,119. In human cancer, gene expression profiles in tumours are often compared with those in the matched control tissues. RNA sequencing (RNA-seq) is one of the most common approaches for transcriptomic studies. In RNA-seq, total RNA is extracted from mutant or control samples, followed by reverse transcription to generate a cDNA library (Fig. 5a). After adding adaptors, RNA-seq samples can then be processed with next-generation sequencing. Computational algorithms are available to facilitate downstream RNA-seq data analysis. Recently, another transcriptomic platform has emerged: the library of integrated network-based cellular signatures (LINCS) L1000 (Ref. 120). LINCS L1000 can profile gene expression changes following genetic perturbation (mutations) of cell lines at high throughput. L1000 detects transcript abundance with optically addressed microspheres and a flow cytometric system (Fig. 5b). As a result, L1000 can directly measure a reduced representation of the transcriptome caused by a genetic mutation. Taken together, these techniques are robust in their assessment of global RNA expression levels for a given genetic background.

Figure 5: Experimental platforms to characterize cancer mutations.
Figure 5

Experimentalpipelines for systematically characterizing functional changes induced by patient-specific mutations. Mutant clones are usually tested in parallel with their wild-type counterparts for comparative purposes. a | RNA sequencing (RNA-seq) compares the transcriptomic profiles of wild-type and mutant cells, based on next-generation sequencing of their respective cDNA pools. b | Library of integrated network-based cellular signatures (LINCS) L1000 compares the transcriptomic profiles of wild-type and mutant cells, based on a biotin and streptavidin, phycoerythrin conjugated (SAPE) optical detection system. c | Reverse-phase protein array (RPPA). RPPA is a high-throughput protein microarray technology that detects protein expression levels in tissue or cell lysates (wild-type or mutant) based on specific antibodies. d | Affinity purification coupled with mass spectrometry (AP–MS) identifies changes in protein interaction partners between wild-type and mutant proteins, based on antibody-based AP, followed by MS. The type of MS indicated is liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS) e | Enhanced yeast two-hybrid (eY2H). The DNA-binding domain (DBD) and activation domain (AD) each fused to bait and prey proteins, respectively, reconstitute the transcription factor activity if brought into close proximity by the interaction between bait and prey proteins. f | Protein fragment complementation assay (PCA). Two complementary protein fragments fused to bait and prey proteins, respectively, reconstitute the full-length functional protein when brought into close proximity by the interaction between bait and prey proteins. g | ChIP-seq combines chromatin immunoprecipitation (ChIP) with next-generation sequencing to identify DNA sites to which proteins (such as transcription factors) bind. h | Enhanced yeast one-hybrid (eY1H). A DNA fragment is cloned upstream of a reporter. Upon binding of the protein of interest, the reporter is turned on and the activity can be measured. i | Protein-binding microarray (PBM). PBM is an in vitro technology that detects a broad spectrum of DNA-binding specificities for transcription factors on a large scale, based on fluorescence measurement. j | CRISPR. The CRISPR system consists of two components: the CRISPR-associated endonuclease 9 (Cas9) and the single guide RNA (sgRNA). The specificity of the endonuclease is determined by the complementarity of the sgRNA and its 20-nucleotide target sequence in the genome. The Cas9 endonuclease creates DNA double-strand breaks (DSBs) at the target site, which are repaired either by the non-homologous end-joining (NHEJ) mechanism generating gene 'knockouts' or by homology-directed repair (HDR) for precise editing of the genome. Generated mutant strains can be tested downstream for alterations in molecular interactions, gene expression, fitness and drug resistance or sensitivity.

Proteome profiles altered by mutations. To monitor protein expression levels in mutants on a large scale, antibody-based platforms for targeted or global quantification of protein expression have been developed. Reverse-phase protein array (RPPA) technology (Fig. 5c) is a type of protein microarray that uses antibodies to detect relative expression levels of proteins in tissue or cell lysates from hundreds of samples simultaneously. This has been applied to a large number of mutation-bearing tumour samples from patients with cancer. A stringent antibody validation procedure must be in place to ensure the sensitivity, specificity and robustness of the platform. In addition, RPPA enables protein profiling with a small amount of sample and is a cost-effective technology. At present, most RPPA data sets usually include 150–300 antibodies that measure total proteins or specific modifications, such as phosphorylation, cleavage and fatty acid alteration.

As a result of these technical advantages, a number of studies have used the RPPA platform to analyse the expression of proteins involved in cell cycle progression, apoptosis, signalling network activities and other key pathways that are associated with specific mutation variants. Furthermore, RPPA has proved to be an efficient approach to assess functions of rare mutations in a high-throughput manner and to identify driver mutations. A recent study121 characterizing phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit-α (PIK3CA) mutations using RPPA showed that, even though the effect of its hotspot mutations is well-established, differential oncogenic activity and variant-specific activation of PI3K signalling and other pathways, such as the mitogen-activated protein kinase (MAPK) pathway, have been observed across lower frequency mutations. RPPA has also been used to characterize mutations in PIK3 regulatory subunit 1 (PIK3R1), showing that mutations can have marked edgetic effects through interrupting protein–protein interactions (PPIs) and also through gaining new interaction partners122,123,124. These studies suggest mutation-specific targeting as a potentially more efficient approach in precision medicine.

Affinity purification coupled with mass spectrometry (AP–MS) enables the detection of protein expression and PPIs in near-physiological conditions. After affinity purification using an antibody against the bait protein, mass spectrometry is then used to provide global and targeted profiling of protein expression (or modification) (Fig. 5d). Following normalization, the peak intensity patterns are analysed and compared across multiple samples125. To study interaction changes by cancer mutations, AP–MS has been applied to characterize changes in PPIs by melanoma-associated mutations in human CDK4 (Ref. 125). Various quantitative techniques have been applied to MS-based proteomic analysis, including label-free, metabolic labelling (for example, stable isotope labelling with amino acids in cell culture (SILAC)) and chemical labelling (for example, isobaric tag for relative and absolute quantification (iTRAQ) or tandem mass tag (TMT)). Recent MS-based studies using iTRAQ labelling have shown the ability to identify 8,000–11,000 proteins and 25,000 phosphosites per tumour on average126,127. However, these approaches require a large amount of input material, remain time consuming and are costly. Although restricted by the number of available antibodies, RPPA is a robust approach to evaluate protein expression levels under different conditions or in distinct cell types. By contrast, AP–MS provides a complementary assessment of protein expression, which is global but less specific.

Protein–protein interaction changes induced by mutations. Genetic mutations can impair protein interactions. Mechanistic understanding of cancer-associated mutations requires finding the molecular interactions and biochemical activities that these mutations perturb. For example, the cancer-associated C305F mutation in the zinc finger domain of MDM2 causes the loss of its binding to ribosomal proteins. This interaction perturbation disrupts the ribosomal stress response, contributing to tumorigenesis128. In addition, the missense mutation R172H of p53 leads to a gain of interaction with the tumour suppressor disabled homologue 2-interacting protein (DAB2IP), which inhibits DAB2IP function and increases the invasive behaviour in cancer cells105. Although some cancer mutations have been characterized, the functional mechanism behind most variants remains elusive4,7. Network approaches using interactome maps have been successful for highlighting candidate cancer genes and disease modifier genes9,129; however, the effect of most causal variants on interaction networks remains mainly unknown.

Recently, several large-scale functional variomics and proteomics platforms have been applied to profile mutation-induced changes of molecular interactions, especially PPIs, relative to their wild-type counterparts. The high-throughput Gateway-compatible enhanced yeast two-hybrid130 (HT-eY2H) system (Fig. 5e) and the protein fragment complementation assay (PCA)131 (Fig. 5f) have been implemented to detect PPI alterations. In these systems, the pair of protein partners is each fused to a fragment of a transcription factor or an enzyme, which is stable but inactive in isolation, whereas PPIs reconstitute the transcription factor or enzyme function. A recent investigation of genetic variant-specific effects on PPIs on a large scale across diverse human diseases including cancer identified that, in comparison with non-disease polymorphisms, disease mutations were more likely to associate with interaction perturbations12.

Protein–DNA interaction changes by mutations. WGS has revealed abundant genetic variation affecting not only coding sequences, but also non-coding regulatory elements. For example, many point mutations in the transcription factor runt-related transcription factor 1 (RUNX1) cause defective DNA binding, resulting in a familial platelet disorder that predisposes individuals to acute myeloid leukaemia132. Frequent mutations detected in the TERT promoter create de novo binding motifs for ETS transcription factors and play a central role in cancer-specific telomerase activation31,32. Although protein–DNA interactions have been characterized in a few cases, it remains unclear how the majority of transcription factor mutations or non-coding DNA mutations affect their interactions.

ChIP-seq assays have been extensively used to map genome-wide transcription factor occupancy profiles (Fig. 5g), such as the ENCODE project133,134. In addition, several systems biology platforms have emerged to study protein–DNA interactions on a large scale11,107,135,136, such as enhanced yeast one-hybrid (eY1H) and protein-binding microarrays (PBMs). In the eY1H assay, a putative regulatory DNA sequence is used as bait to search for transcription factors that bind to that DNA sequence in yeast cells137. A reporter gene is often placed downstream of the DNA sequence to assess protein–DNA interactions (Fig. 5h). Using these assays, a systematic study on the binding of ~1,000 human transcription factors to a large number of enhancer mutations found widespread protein–DNA interaction perturbations in disease, which correlate well with target gene expression changes11. Finally, large-scale transcription factor-binding activity could be evaluated using PBMs107 (Fig. 5i). A recent systematic study investigated transcription factor variants for their DNA-binding affinity using PBMs and identified that individuals with distinct mutations have unique transcription factor DNA-binding profiles, which may contribute to phenotypic variation107.

Pleiotropic mutational effects and integrative analysis. In cancer, mutations occur in the context of genomic, transcriptomic and/or epigenetic aberrations. To gain a systems-level understanding of the functional effects of mutations, an integrative analysis of multi-omics data sets (such as gene expression, DNA copy number and DNA methylation) is crucial. Several approaches have been proposed to address this direction. XSeq138 analyses the effect of somatic mutations by incorporating gene expression, patient mutations and a gene interaction network. PARADIGM-SHIFT139 is another example that infers mutated gene activity from gene expression and copy number in the context of genetic pathways. Integration of proteomics analysis with genomic data has enabled the detection of global proteomic patterns associated with potential driver genetic lesions. For example, a study of lung adenocarcinoma cell lines integrated differential protein expression data with distinct p53 mutational status and identified an enrichment of key functional pathways, including epithelial adhesion, immune and stromal cells and mitochondrial function140. Given that genomic mutations often act in a cell type-specific and condition-dependent manner, such integrative modelling is more likely to resolve functional effects of mutations by controlling for other changes.

Functional validation by CRISPR in cancer. CRISPR has emerged as a powerful and flexible tool141,142,143,144 to systematically interrogate cancer genomes. Together with other technologies, CRISPR has produced valuable data on the identification of new cancer genes as well as on the functional consequence of driver mutations. In the most widely used approach for CRISPR-based genome editing, a CRISPR-associated (Cas) nuclease, usually Staphylococcus pyogenes Cas9, is guided to a genomic target site by single guide RNAs (sgRNAs), where it creates DNA double-strand breaks (DSBs)145. DSBs are typically repaired by non-homologous end-joining (NHEJ), leaving a random sequence scar of a small insertion or deletion (indel) that can inactivate the targeted gene of interest146,147 (Fig. 5j). Current genome-wide CRISPR screen libraries contain 7 × 104–2 × 105 sgRNAs, with 3–12 sgRNAs for each gene148,149,150,151,152,153,154,155. Current CRISPR screens are useful for revealing gene-level information, but they lack the resolution to distinguish different mutations within a gene.

To precisely model specific cancer mutations by CRISPR, a DNA template (single stranded or double stranded) of homology is provided to convert the site of DSB into a desirable sequence in a process known as homology-directed repair (HDR) (Fig. 5j). However, HDR is relatively inefficient compared with NHEJ and can be further corrupted by indels. Some recent efforts have been made to improve the applicability of HDR in specific mutation editing156,157,158,159. Synchronizing the expression of Cas9 with cell cycle progression160 or treating cells with two small molecules, L755507 and brefeldin A161, could improve the HDR efficiency. The discovery of the smaller size Cas9 from Staphylococcus aureus allows the packaging of Cas9 and sgRNA expression constructs into the highly versatile adeno-associated virus (AAV) delivery vehicle162, enabling efficient HDR in specific organs such as the liver163.

Recently, direct base change has been achieved by fusing nuclease-dead Cas9 (dCas9) with a cytidine deaminase164,165,166,167, such as the activation-induced cytidine deaminase (AID), and rat-origin APOBEC1. These fusion proteins drive somatic hypermutation in locations close to the CRISPR target, thus creating genetic variants at a defined genomic locus without creating DNA breaks. The efficiency of base editing can be further improved by a second fusion to a bacteriophage uracil glycosylase inhibitor (UGI) and restoration of the nicking activity of dCas9, resulting in a third-generation base editor (BE3, APOBEC–XTEN–dCas9(A840H)–UGI) that mediates C into T conversion with up to 37% efficiency167. Taken together, the ability to induce specific mutations by CRISPR does provide an emerging and powerful tool for the analysis of functional consequences of candidate aberrations in genes in a low-throughput manner. However, despite constant advances made in the CRISPR technology, it remains inefficient for precise editing of genome sequences, making it challenging to apply to modelling the greater complexity edgetic effects of cancer mutations.


In this Review, we summarize computational and high-throughput experimental strategies to systematically characterize cancer genetic mutations in the context of molecular interaction networks at base-pair resolution. Recent advances in systems biology and next-generation sequencing have facilitated the development of functional variomics platforms to evaluate the effect of a large number of disease mutations. Widespread protein–protein and protein–DNA interaction perturbations have been identified across various types of human disease including cancer. It has been shown that different mutations in the same gene frequently result in different interaction perturbation profiles. Taken together, interaction perturbation profiling of disease mutations provides a paradigm for dissecting heterogeneous genetic variants across populations of patients with cancer, which is sorely needed given the large number of uncharacterized patient mutations and their potential effect on cancer phenotype and therapeutic liabilities.

During tumour evolution, mutations arise and accumulate in response to stress signals from the microenvironment or from tumour therapy. During this process, a driver mutation occurs and confers on the tumour a selective advantage. Although other, passenger, mutations do occur, they do not provide any growth advantage. Genetic heterogeneity develops across diverse tumour populations over time. Numerous computational algorithms and tools have been developed to prioritize cancer mutations based on different node- or edge-level functional properties. Although integrative computational analyses achieve some levels of accuracy in predicting disease-causing candidate mutations, a substantial fraction of the top hits should still be experimentally validated. The rapidly increasing functional annotation of cancer-specific mutations from variomics platforms offers the opportunity to iteratively improve the computational predictive tools based on high-quality test data. Furthermore, computational algorithms are often limited in predicting interaction perturbations or gain of function, which are common mutational effects on cancer signalling networks.

Systematic characterization of cancer variants for their effect on interaction networks is crucial to a systems-level understanding of genetic heterogeneity. So far we have focused on heterogeneity across tumour samples; however, intra-tumour heterogeneity also exists. A tumour is made up of many cell types, each of which would have its own set of mutations and underlying networks. Interaction profiles of cancer mutations provide a fundamental link between genotype and phenotype. Network perturbation by mutations facilitates grouping of distinct genotypes that share common effects on interaction profiles underlying a particular phenotype. In addition, the identified perturbed interaction partners allow us to uncover specific targets that are impaired in a mutation-specific context, which may in turn suggest therapeutic biomarkers guiding potential personalized precision medicine. However, the phenotypic diversity of cell types remains an important challenge for network biology: available network databases might not faithfully represent the particular cancer cell types to be investigated; furthermore, cell type composition and infiltration by stromal and immune system cells might affect network wiring.

Cell type specificity and tumour microenvironment need to be taken into account to obtain a comprehensive understanding of cancer mutations. Integration of context-dependent computational resources and improved algorithms would help to deconvolute the functional effects of mutations in a cell type-specific manner. In addition, emerging technologies such as single-cell experimental approaches could help further stratify mutational effects. To construct higher resolution functional networks, it would be essential to incorporate multiple properties of genetic mutations, including gene expression, protein folding and structure, protein–protein and protein–DNA interactions and beyond.



  1. 1.

    et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

  2. 2.

    & Genome landscapes of disease: strategies to predict the phenotypic consequences of human germline and somatic variation. PLoS Comput. Biol. 12, e1005043 (2016).

  3. 3.

    , & Interactome networks and human disease. Cell 144, 986–998 (2011).

  4. 4.

    et al. Edgotype: a fundamental link between genotype and phenotype. Curr. Opin. Genet. Dev. 23, 649–657 (2013).

  5. 5.

    Coming full circle — from endless complexity to simplicity and back again. Cell 157, 267–271 (2014).

  6. 6.

    , & Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011). This paper shows network models for molecular and pathway relationships for complex diseases.

  7. 7.

    , , , & Neomorphic mutations create therapeutic challenges in cancer. Oncogene (2016). This paper highlights diverse functional effects of different edgetic or neomorphic mutations, which should be taken into account for designing precision medicine.

  8. 8.

    et al. Systematic functional interrogation of rare cancer variants identifies oncogenic alleles. Cancer Discov. 6, 714–726 (2016). This is one of the first papers showing systematic characterization of distinct cancer hallmark behaviours of rare oncogenic alleles.

  9. 9.

    et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014). This is one of the largest scale human interactome network maps identifying novel connectivity modules between cancer proteins.

  10. 10.

    et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).

  11. 11.

    et al. Human gene-centered transcription factor networks for enhancers and disease variants. Cell 161, 661–673 (2015). This is one of the first studies to characterize the protein–DNA interactions altered by enhancer mutations on a large scale.

  12. 12.

    et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell 161, 647–660 (2015). This is one of the first papers showing systematic characterization of a large number of mutations involved in ~1,000 human diseases, in terms of their functional effect on protein–protein and protein–DNA interaction networks, and protein folding and stability.

  13. 13.

    , , , & Network-based stratification of tumor mutations. Nat. Methods 10, 1108–1115 (2013). This is a method that integrates tumour genomes with gene networks to cluster together patients with mutations in a similar network 'neighbourhood'.

  14. 14.

    et al. High-resolution network biology: connecting sequence with function. Nat. Rev. Genet. 14, 865–879 (2013).

  15. 15.

    , & Exploring genetic interactions and networks with yeast. Nat. Rev. Genet. 8, 437–449 (2007).

  16. 16.

    & Decoding signalling networks by mass spectrometry-based proteomics. Nat. Rev. Mol. Cell Biol. 11, 427–439 (2010).

  17. 17.

    et al. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17, 93–108 (2016). This paper shows recent computational and experimental advances in evaluating the functional effect of non-coding cancer variants.

  18. 18.

    et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

  19. 19.

    et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).

  20. 20.

    et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).

  21. 21.

    et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

  22. 22.

    Mutation and cancer: statistical study of retinoblastoma. Proc. Natl Acad. Sci. USA 68, 820–823 (1971).

  23. 23.

    The clonal evolution of tumor cell populations. Science 194, 23–28 (1976).

  24. 24.

    Mutation selection and the natural history of cancer. Nature 255, 197–200 (1975).

  25. 25.

    & Clonal evolution in cancer. Nature 481, 306–313 (2012).

  26. 26.

    , , & Ten years of next-generation sequencing technology. Trends Genet. 30, 418–426 (2014).

  27. 27.

    et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

  28. 28.

    et al. Specific association of human telomerase activity with immortal cells and cancer. Science 266, 2011–2015 (1994).

  29. 29.

    et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

  30. 30.

    et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).

  31. 31.

    et al. Cancer. TERT promoter mutations and telomerase reactivation in urothelial cancer. Science 347, 1006–1010 (2015).

  32. 32.

    et al. Cancer. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. Science 348, 1036–1039 (2015).

  33. 33.

    & N-Myc and noncoding RNAs in neuroblastoma. Mol. Cancer Res. 10, 1243–1253 (2012).

  34. 34.

    et al. Effects of a novel long noncoding RNA, lncUSMycN, on N-Myc expression and neuroblastoma progression. J. Natl Cancer Inst. 106, dju113 (2014).

  35. 35.

    et al. Frequent deletions and down-regulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc. Natl Acad. Sci. USA 99, 15524–15529 (2002).

  36. 36.

    Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

  37. 37.

    , & The cancer genome. Nature 458, 719–724 (2009).

  38. 38.

    & Lessons from the cancer genome. Cell 153, 17–37 (2013).

  39. 39.

    , , & Clinicopathological relevance of BRAF mutations in human cancer. Pathology 45, 346–356 (2013).

  40. 40.

    et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 1127–1133 (2013).

  41. 41.

    et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016).

  42. 42.

    , , & Adjusting for background mutation frequency biases improves the identification of cancer driver genes. IEEE Trans. Nanobiosci. 12, 150–157 (2013).

  43. 43.

    et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).

  44. 44.

    et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).

  45. 45.

    et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

  46. 46.

    & The effects of chromatin organization on variation in mutation rates in the genome. Nat. Rev. Genet. 16, 213–223 (2015).

  47. 47.

    et al. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268–274 (2006).

  48. 48.

    et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 (2008).

  49. 49.

    et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008).

  50. 50.

    et al. The genomic landscapes of human breast and colorectal cancers. Science 318, 1108–1113 (2007).

  51. 51.

    et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011).

  52. 52.

    et al. Molecular Biology of the Cell (Garland Science, 2002).

  53. 53.

    et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).

  54. 54.

    , & Distilling pathophysiology from complex disease genetics. Cell 155, 21–26 (2013).

  55. 55.

    et al. Exploring mechanisms of human disease through structurally resolved protein interactome networks. Mol. Biosyst. 10, 9–17 (2014).

  56. 56.

    & SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

  57. 57.

    et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 40, W452–W457 (2012).

  58. 58.

    , , , & SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).

  59. 59.

    et al. Pan-cancer analysis of mutation hotspots in protein domains. Cell Systems 1, 197–209 (2015).

  60. 60.

    et al. Hotspot mutations delineating diverse mutational signatures and biological utilities across cancer types. BMC Genomics 17 (Suppl. 2), 394 (2016).

  61. 61.

    et al. MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis. Genome Biol. 15, 489 (2014).

  62. 62.

    et al. Personalized genomic analyses for cancer mutation discovery and interpretation. Sci. Transl Med. 7, 283ra53 (2015).

  63. 63.

    et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

  64. 64.

    et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

  65. 65.

    , & STRUM: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics 32, 2936–2946 (2016).

  66. 66.

    & e-Driver: a novel method to identify protein regions driving cancer. Bioinformatics 30, 3109–3114 (2014).

  67. 67.

    et al. Computational approaches to identify functional genetic variants in cancer genomes. Nat. Methods 10, 723–729 (2013).

  68. 68.

    , , & Expanding the computational toolbox for mining cancer genomes. Nat. Rev. Genet. 15, 556–570 (2014).

  69. 69.

    , & Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).

  70. 70.

    & Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers. Mol. Syst. Biol. 9, 637 (2013).

  71. 71.

    et al. Kinome-wide decoding of network-attacking mutations rewiring cancer signaling. Cell 163, 202–217 (2015).

  72. 72.

    et al. Unmasking determinants of specificity in the human kinome. Cell 163, 187–201 (2015).

  73. 73.

    et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat. Biotechnol. 30, 159–164 (2012).

  74. 74.

    et al. mutation3D: cancer gene prediction through atomic clustering of coding variants in the structural proteome. Hum. Mutat. 37, 447–456 (2016).

  75. 75.

    et al. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat. Genet. 48, 827–837 (2016).

  76. 76.

    et al. CanDrA: cancer-specific driver missense mutation annotation with optimized features. PLoS ONE 8, e77945 (2013).

  77. 77.

    , , & MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010).

  78. 78.

    & Interaction-based discovery of functionally important genes in cancers. Nucleic Acids Res. 42, e18 (2014).

  79. 79.

    , , , & Systematic prioritization of druggable mutations in approximately 5000 genomes across 16 cancer types using a structural genomics-based approach. Mol. Cell. Proteomics 15, 642–656 (2016).

  80. 80.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  81. 81.

    & Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 10, 1556–1566 (2015).

  82. 82.

    , & ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

  83. 83.

    et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

  84. 84.

    , , & Functional annotation of noncoding sequence variants. Nat. Methods 11, 294–296 (2014).

  85. 85.

    , , & A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

  86. 86.

    et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).

  87. 87.

    & Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

  88. 88.

    , , , & GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucleic Acids Res. 41, W150–W158 (2013).

  89. 89.

    , & Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med. 4, 89 (2012).

  90. 90.

    , , & CanPredict: a computational tool for predicting cancer-associated missense mutations. Nucleic Acids Res. 35, W595–W598 (2007).

  91. 91.

    , , & SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J. Mol. Biol. 426, 2692–2701 (2014).

  92. 92.

    et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).

  93. 93.

    et al. Identification of causal genetic drivers of human disease through systems-level analysis of regulatory networks. Cell 159, 402–414 (2014).

  94. 94.

    et al. DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol. 13, R124 (2012).

  95. 95.

    et al. Discovering causal pathways linking genomic events to transcriptional states using tied diffusion through interacting events (TieDIE). Bioinformatics 29, 2757–2764 (2013).

  96. 96.

    et al. Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer -omics profiles. Nucleic Acids Res. 43, e44 (2015).

  97. 97.

    & VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data. PLoS Comput. Biol. 10, e1003460 (2014).

  98. 98.

    , & Algorithms for detecting significantly mutated pathways in cancer. J. Comput. Biol. 18, 507–522 (2011).

  99. 99.

    et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).

  100. 100.

    , , & BeAtMuSiC: prediction of changes in protein-protein binding affinity on mutations. Nucleic Acids Res. 41, W333–W339 (2013).

  101. 101.

    , & Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces. Bioinformatics 31, 2397–2399 (2015).

  102. 102.

    , , & MutaBind estimates and interprets the effects of sequence variants on protein–protein interactions. Nucleic Acids Res. 44, W494–501 (2016).

  103. 103.

    et al. dSysMap: exploring the edgetic role of disease mutations. Nat. Methods 12, 167–168 (2015). This is a method that maps missense disease mutations onto the structurally resolved human interactome.

  104. 104.

    , & Interactome3D: adding structural details to protein networks. Nat. Methods 10, 47–53 (2013).

  105. 105.

    et al. Mutant p53 reprograms TNF signaling in cancer cells through interaction with the tumor suppressor DAB2IP. Mol. Cell 56, 617–629 (2014).

  106. 106.

    et al. Extensive rewiring and complex evolutionary dynamics in a C. elegans multiparameter transcription factor network. Mol. Cell 51, 116–127 (2013).

  107. 107.

    et al. Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science 351, 1450–1454 (2016).

  108. 108.

    , , & is-rSNP: a novel technique for in silico regulatory SNP detection. Bioinformatics 26, i524–i530 (2010).

  109. 109.

    & HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012).

  110. 110.

    et al. OncoCis: annotation of cis-regulatory mutations in cancer. Genome Biol. 15, 485 (2014).

  111. 111.

    & BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations. Nucleic Acids Res. 43, e147 (2015).

  112. 112.

    & Towards a molecular understanding of microRNA-mediated gene silencing. Nat. Rev. Genet. 16, 421–433 (2015).

  113. 113.

    , & The widespread regulation of microRNA biogenesis, function and decay. Nat. Rev. Genet. 11, 597–610 (2010).

  114. 114.

    , , , & Patrocles: a database of polymorphic miRNA-mediated gene regulation in vertebrates. Nucleic Acids Res. 38, D640–D651 (2010).

  115. 115.

    , & SomamiR: a database for somatic mutations impacting microRNA function in cancer. Nucleic Acids Res. 41, D977–D982 (2013).

  116. 116.

    et al. PolymiRTS database: linking polymorphisms in microRNA target sites with complex traits. Nucleic Acids Res. 35, D51–D54 (2007).

  117. 117.

    , , & PolymiRTS database 2.0: linking polymorphisms in microRNA target sites with human diseases and complex traits. Nucleic Acids Res. 40, D216–D221 (2012).

  118. 118.

    , & PolymiRTS database 3.0: linking polymorphisms in microRNAs and their target sites with human diseases and biological pathways. Nucleic Acids Res. 42, D86–D91 (2014).

  119. 119.

    et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510, 363–369 (2014).

  120. 120.

    et al. LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures. Nucleic Acids Res. 42, W449–W460 (2014).

  121. 121.

    et al. Identification of variant-specific functions of PIK3CA by rapid phenotyping of rare mutations. Cancer Res. 75, 5341–5354 (2015).

  122. 122.

    et al. Regulation of the PI3K pathway through a p85α monomer-homodimer equilibrium. eLife 4, e06866 (2015).

  123. 123.

    et al. Naturally occurring neomorphic PIK3R1 mutations activate the MAPK pathway, dictating therapeutic response to MAPK pathway inhibitors. Cancer Cell 26, 479–494 (2014).

  124. 124.

    et al. High frequency of PIK3R1 and PIK3R2 mutations in endometrial cancer elucidates a novel mechanism for regulation of PTEN protein stability. Cancer Discov. 1, 170–185 (2011).

  125. 125.

    et al. Mapping differential interactomes by affinity purification coupled with data-independent mass spectrometry acquisition. Nat. Methods 10, 1239–1245 (2013). This paper assesses the alterations in protein interaction partners by AP–MS.

  126. 126.

    et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534, 55–62 (2016).

  127. 127.

    et al. Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166, 755–765 (2016).

  128. 128.

    et al. An ARF-independent c-MYC-activated tumor suppression pathway mediated by ribosomal protein-Mdm2 Interaction. Cancer Cell 18, 231–243 (2010).

  129. 129.

    et al. Edgetic perturbation models of human inherited disorders. Mol. Syst. Biol. 5, 321 (2009).

  130. 130.

    & A novel genetic system to detect protein-protein interactions. Nature 340, 245–246 (1989).

  131. 131.

    et al. Benchmarking a luciferase complementation assay for detecting protein complexes. Nat. Methods 8, 990–992 (2011).

  132. 132.

    Point mutations in the RUNX1/AML1 gene: another actor in RUNX leukemia. Oncogene 23, 4284–4296 (2004).

  133. 133.

    et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).

  134. 134.

    et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

  135. 135.

    et al. Profiling the human protein–DNA interactome reveals ERK2 as a transcriptional repressor of interferon signaling. Cell 139, 610–622 (2009).

  136. 136.

    et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).

  137. 137.

    , , & A gateway-compatible yeast one-hybrid system. Genome Res. 14, 2093–2101 (2004).

  138. 138.

    et al. Systematic analysis of somatic mutations impacting gene expression in 12 tumour types. Nat. Commun. 6, 8554 (2015).

  139. 139.

    et al. PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis. Bioinformatics 28, i640–i646 (2012).

  140. 140.

    et al. Proteomic signatures associated with p53 mutational status in lung adenocarcinoma. Proteomics 14, 2750–2759 (2014).

  141. 141.

    et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

  142. 142.

    et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

  143. 143.

    et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).

  144. 144.

    , , & Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).

  145. 145.

    & CRISPR–Cas: new tools for genetic manipulations from bacterial immunity systems. Annu. Rev. Microbiol. 69, 209–228 (2015).

  146. 146.

    et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).

  147. 147.

    , & High-throughput functional genomics using CRISPR–Cas9. Nat. Rev. Genet. 16, 299–311 (2015).

  148. 148.

    et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

  149. 149.

    , , & Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).

  150. 150.

    , , , & Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267–273 (2014).

  151. 151.

    et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).

  152. 152.

    et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

  153. 153.

    , & Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).

  154. 154.

    et al. A CRISPR-based screen identifies genes essential for West-Nile-virus-induced cell death. Cell Rep. 12, 673–683 (2015).

  155. 155.

    et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526 (2015).

  156. 156.

    et al. Increasing the efficiency of precise genome editing with CRISPR–Cas9 by inhibition of nonhomologous end joining. Nat. Biotechnol. 33, 538–542 (2015).

  157. 157.

    et al. Increasing the efficiency of homology-directed repair for CRISPR–Cas9-induced precise gene editing in mammalian cells. Nat. Biotechnol. 33, 543–548 (2015).

  158. 158.

    et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125–129 (2016).

  159. 159.

    et al. Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nat. Biotechnol. 35, 48–55 (2017).

  160. 160.

    , , , & Post-translational regulation of Cas9 during G1 enhances homology-directed repair. Cell Rep. 14, 1555–1566 (2016).

  161. 161.

    et al. Small molecules enhance CRISPR genome editing in pluripotent stem cells. Cell Stem Cell 16, 142–147 (2015).

  162. 162.

    et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).

  163. 163.

    et al. A dual AAV system enables the Cas9-mediated correction of a metabolic liver disease in newborn mice. Nat. Biotechnol. 34, 334–338 (2016).

  164. 164.

    et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016).

  165. 165.

    et al. Targeted AID-mediated mutagenesis (TAM) enables efficient genomic diversification in mammalian cells. Nat. Methods 13, 1029–1035 (2016).

  166. 166.

    et al. Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nat. Methods 13, 1036–1042 (2016).

  167. 167.

    , , , & Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

Download references


N.S. would like to acknowledge the following grants: the Cancer Prevention and Research Institute of Texas (CPRIT) New Investigator Grant RR160021, the University of Texas System Rising STARs award, the US National Institutes of Health (NIH)–National Cancer Institute (NCI) grants P30CA016672, U54HG008100 and U01CA168394, and the University Center Foundation via the Institutional Research Grant program at the University of Texas MD Anderson Cancer Center.

Author information

Author notes

    • Song Yi
    • , Shengda Lin
    •  & Yongsheng Li

    These authors contributed equally to this work.


  1. Department of Systems Biology, University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA.

    • Song Yi
    • , Yongsheng Li
    • , Wei Zhao
    • , Gordon B. Mills
    •  & Nidhi Sahni
  2. Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA.

    • Shengda Lin
  3. Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA.

    • Nidhi Sahni


  1. Search for Song Yi in:

  2. Search for Shengda Lin in:

  3. Search for Yongsheng Li in:

  4. Search for Wei Zhao in:

  5. Search for Gordon B. Mills in:

  6. Search for Nidhi Sahni in:

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Song Yi or Nidhi Sahni.

Supplementary information

Word documents

  1. 1.

    Supplementary information S1 (table)

    Node centered computational methods to characterize the function of cancer mutations.


Missense mutations

(Also known as non-synonymous mutations). Nucleotide mutations in exons of protein-coding genes that cause amino acid substitutions in the protein.

Frame-shift mutations

Nucleotide mutations in exons of protein-coding genes that cause an alteration to the reading frame of translation and usually result in a premature stop codon and a truncated or non-expressed protein. They typically involve small insertions or deletions of a number of nucleotides that is not divisible by three.

Silent mutations

(Also known as synonymous mutations). Nucleotide mutations in exons of protein-coding genes that do not alter the coded amino acid (due to degeneracy in the genetic code).

Nonsense mutations

Nucleotide mutations in exons of protein-coding genes that change amino acid-encoding codons into stop codons.


(Chromatin immunoprecipitation followed by sequencing). Antibody-based immunoprecipitation of a chromatin-associated protein, such as a transcription factor (often epitope tagged) and its potentially interacting crosslinked DNA fragments, followed by sequencing to reveal the identity of these DNA fragments. Overall, this approach reveals the genomic sites of occupancy of the protein of interest.


Genome-wide sequencing of open chromatin regions that are sensitive to cleavage by DNase I. Open chromatin is enriched for regulatory sequences.

Chromosome conformation capture

A method that analyses the spatial organization of chromatin in a cell by quantifying the interactions between genomic loci that are in proximity in three-dimensional space.

Gene ontology

A unified representation of attributes for genes and gene products across species, which helps functional interpretation of experimental data.

Topological centrality

In molecular interaction networks, topological centrality is an intrinsic network property that measures the overall position and 'connectedness' of a node in the networks.


Creating a single-strand DNA break.

About this article

Publication history



Further reading