Proteins interact with other macromolecules in complex cellular networks for signal transduction and biological function. In cancer, genetic aberrations have been traditionally thought to disrupt the entire gene function. It has been increasingly appreciated that each mutation of a gene could have a subtle but unique effect on protein function or network rewiring, contributing to diverse phenotypic consequences across cancer patient populations. In this Review, we discuss the current understanding of cancer genetic variants, including the broad spectrum of mutation classes and the wide range of mechanistic effects on gene function in the context of signalling networks. We highlight recent advances in computational and experimental strategies to study the diverse functional and phenotypic consequences of mutations at the base-pair resolution. Such information is crucial to understanding the complex pleiotropic effect of cancer genes and provides a possible link between genotype and phenotype in cancer.
Current cancer therapies targeted to particular genetic lesions are primarily hampered by the extreme genetic heterogeneity observed across patient populations.
Cancer genomic variants and regulatory molecules interact with each other in cellular networks. Network biology has recently emerged as a systems-level approach to stratifying mutations that give rise to markedly different phenotypes.
Different cancer mutations often lead to distinct perturbations in signal transduction networks.
Several computational tools have been developed to analyse functional effects of cancer mutations and to prioritize drivers of oncogenesis.
Various experimental strategies have emerged to study mutation-specific functional effects. Such functional variomics approaches can dissect cancer variants at high resolution.
Integration of computational predictions with systems biology experimental approaches will be crucial for interpreting complex genotype-to-phenotype relationships in human disease including cancer. Together this effort represents a critical step towards precision medicine.
With rapidly evolving next-generation sequencing technologies, there has been an explosion in human genotypic information, particularly for disease mutations associated with numerous types of cancer1,2 (Fig. 1a). However, the functional paths by which heterogeneous genotypic variants lead to diverse phenotypic consequences remain largely unresolved3,4. More importantly, how the multiple genomic aberrations present in a single tumour integrate into the tumour phenotype — including response to therapy and patient outcomes — remains an overarching gap in knowledge in the field5,6. It is therefore crucial to identify the functional roles of distinct cancer mutations. A 'one gene, one function, one disease' model cannot be reconciled with the complexity that different mutations in the same gene often lead to markedly diverse phenotypes7,8. It is now clear that genes and gene products do not function in isolation but rather interact with each other in cellular networks9,10. A systems-level understanding of how cancer mutations affect signalling networks is pivotal for interpreting the complex genotype-to-phenotype relationships in terms of tumour behaviour and patient outcomes11,12. This more sophisticated functional understanding of mutations is key to distinguish drivers from non-pathogenic passengers, as well as to enhance clinical diagnostics, prognostics and therapeutics13.
Systematic high-throughput functional 'variomics' platforms to assess mutation-specific network perturbations are beginning to emerge11,12. In this Review, we discuss recent advances in the identification and functional characterization of genomic variants in human cancers, as part of an initiative towards creating a functional landscape of the cancer genomes. Integration of systems genetics with signalling networks will be crucial for prioritizing cancer-causing variants and for uncovering patient mutation-specific disease mechanisms and their resultant therapeutic liabilities4,6,13. Together, this effort represents a crucial step towards personalized medicine.
In this Review, we first provide a brief overview of different classes of genomic aberration that underlie cancer heterogeneity. We then discuss the importance of systems genetics and network biology for understanding the functional effects of various cancer mutations. To distinguish driver mutations from passenger mutations, we describe a toolkit of recent computational and bioinformatics algorithms to prioritize genomic mutations and to interpret their potential functional effect. To reveal the molecular underpinnings driving selection of particular mutations, we further describe emerging systems genetics technologies that enable the functional characterization of cancer variants in a high-throughput manner. Finally, with the integration of computational and experimental platforms, we aim to provide systems-level strategies to stratify cancer mutations at single nucleotide resolution, thereby bridging genotype to phenotype in cancer.
For brevity, we do not discuss genetic interaction networks or metabolic networks (reviewed in Refs 3,14,15), modelling of network dynamics and the analysis of network centrality (reviewed in Refs 3,6,16) or the functional platforms used to assess variants in non-coding regions (reviewed in Ref. 17), as these have been extensively reviewed previously.
Genetic heterogeneity in cancer
Genomic instability propels the accumulation of mutations in cancer cells and results in the rapid evolution of cancer genomes in response to both the microenvironmental stresses that occur during tumour evolution and the stresses induced by tumour therapy. Identifying and characterizing cancer driver mutations is among the most pressing needs for understanding the causality and progression of tumours, as well as for developing effective treatments tailored to specific patients with cancer. The plethora of sequencing data from recent whole-genome and whole-exome sequencing projects has begun to enable the elucidation of the mutational landscape of most common cancers and the functional and structural elements of cancer genomes. In this Review, we refer to 'genetic heterogeneity' primarily as distinct mutations between tumours or across patients.
Somatic versus germline mutations. The majority of cancer mutations are somatic. Approximately 90% of cancer genes show somatic mutations and 20% show germline mutations, with 10% showing both somatic and germline mutations18. Some somatic mutations, such as those in telomerase reverse transcriptase (TERT) and TP53 (which encodes p53), frequently occur across cancer lineages, but others can be more tissue specific. Somatic aberrations show much more diverse patterns compared with germline variants, including complex genomic rearrangements, such as chromoplexy19, chromothripsis20 and kataegis21. This is probably because of much reduced evolutionary constraints in somatic cells, as somatic mutations only need to be compatible with viability of the subset of cells in which they are found, whereas germline-inherited mutations will be present body-wide throughout development. The accumulation of somatic mutations in cancer cells is pivotal in cancer progression. The 'two-hit' hypothesis originally postulated that tumours develop as a consequence of a second somatic mutation occurring upon the first inherited or somatic mutation22. It is now clear that more than two 'hits' are needed for full emergence of the cancer phenotype in most tumours, with statistical models placing this at between two and six driver aberrations. Cancer is commonly viewed as an evolutionary process of genetic instability and natural selection, driven by ongoing accumulation of somatic mutations23,24. The continuous somatic evolution of cancers then contributes to genetic heterogeneity1 and clonal expansion25 during tumour progression.
Coding versus non-coding mutations. Protein-coding regions of the genome constitute the 'exome', which represents <2% of the whole genome, but contains ~85% of known disease-related variants26 (Fig. 1b). However, this is likely to be a technology bias as until recently high-throughput approaches to identify disease-related variants in non-coding regions have been limited. Whole-genome sequencing (WGS) has been applied for deep understanding of non-coding regulation in human cancer. By contrast, whole-exome sequencing (WES) has been widely used and has proved to be a reliable and cost-effective approach to reveal the somatic cancer mutation landscape of protein-coding regions27 (Fig. 1c). Non-coding regions contain functional modules that can elicit marked effects on expression profiles. Such cis-regulatory elements include promoters, enhancers, silencers and insulators (Fig. 1b). Mutations in these elements can drastically alter gene regulation. A potent example is TERT, which is reactivated in 80–90% of human cancers28, often through recurrent mutations in its promoter region29,30. These mutations create de novo binding sites for the GA-binding protein (GABP) transcription factors31,32. Non-coding RNAs (ncRNAs) are also frequent targets of genomic aberrations33,34,35.
Identification of mutations in the coding genome can reveal pathways that underlie cancer progression and can identify therapeutic drug targets. Recently, The Cancer Genome Atlas (TCGA) pan-cancer analysis36 has identified numerous cancer aberrations and has helped to assign the abnormalities into physical complexes and pathways. Indeed, the aggregate of mutations in a complex such as the SWI/SNF complex or in a pathway such as the homologous recombination pathway functions to establish their importance in the tumorigenic process and to further emphasize the need to characterize aberrations not as independent events but rather as part of functional machines.
Driver versus passenger aberrations. Identification of cancer variants has been dominated by genotyping-by-sequencing. Indeed, more than 3 million independent variants in coding regions have been identified by sequencing, with only a small subset of these being functionally annotated. Importantly, not all genetic alterations are relevant to tumour progression. The terms 'driver' and 'passenger' distinguish causal versus random events in cancers37. A driver mutation provides a selective advantage to the tumour clone in its microenvironment at some point during its history, but is not necessarily required to sustain tumour growth throughout the evolution of the tumour. Driver aberrations can be derived from major events, such as chromosomal gains and losses, chromosomal shattering and chromosomal chains38. Alternatively, driver mutations can be derived from mutational hotspots, for example, V600E mutation in BRAF39.
In cancer, there are multiple classes of mutation: missense mutations, frame-shift mutations, silent mutations, nonsense mutations, insertions or deletions, non-coding mutations and others. The relative frequency of the classes of genomic aberration vary markedly across cancer lineages, although overall, missense mutations are by far the most frequent and dominant class of coding aberrations in human cancers (Fig. 2a,b). Kidney clear-cell carcinoma, glioblastoma multiforme, hepatocellular carcinoma, acute myeloid leukaemia, colorectal carcinoma and endometrial carcinoma, with the exception of the serous-like subtype, are dominated by single nucleotide mutations, whereas almost all serous ovarian and breast carcinoma samples, a large fraction of lung and head and neck squamous cell carcinomas are dominated by copy number variations40,41.
The prevalence of random mutations, non-cancer tissue mixed in tumours, clonal heterogeneity and ploidy variation makes it difficult to accurately 'call' mutations irrespective of whether the mutation is a driver or a passenger. There are several strategies to improve mutation calling. First, sufficient sequencing depth is required. WGS is often carried out with 30- to 60-fold coverage, WES is often carried out with 100- to 150-fold coverage and targeted sequencing of candidate gene panels is typically carried out at 200- to 2000-fold coverage38. Second, accurate estimation of the background mutation rate is essential for the discrimination of driver versus passenger mutations, as passenger mutations are expected to be present at similar frequencies to background mutation rates whereas driver mutations are expected to be present at higher frequencies because of the selective advantage they engender. Background mutation rates have traditionally been derived from synonymous mutation rates42 and intronic and untranslated region (UTR) mutations43. Despite lower depth in WGS or WES readouts compared with targeted gene panels, WGS and WES allow background mutation rates to be estimated with higher accuracy. Other genomic parameters have also been taken into account in modern algorithms such as MuSiC44 and MutSigCV45. It is likely that mutation rates are not uniform across genomes46 and that regional estimates of mutation rates that take into account replication timing and other factors are needed.
A cancer network rewiring perspective
Systems genetics: from pathways to cancer networks. Initial studies of the genetic heterogeneity of human cancers originated from pioneering WES projects in breast, pancreatic, colorectal and brain cancers47,48,49,50. These studies showed that cancer genomes are highly complex with 50–100 somatic alterations in each tumour. The Catalogue of Somatic Mutations in Cancer (COSMIC), which catalogues somatic mutation frequencies in tumours and tumour-derived cell lines, as of the latest Release v76 (February 2016), contained more than 3.9 million unique coding mutations in more than 5,000 human genes from more than 1 million tumour samples51. Once identified, cancer genes and mutations need to be assigned to functional and regulatory pathways. Common sets of conserved proteins participate in 'core' functional pathways that mediate essential cellular processes in different cell types and tissues52. However, it remains unclear how these conserved pathways function in different contexts to achieve signalling specificity and how mutations in cancer cells disrupt signalling pathways and cellular functions.
Although distinguishing causal disease variants from non-pathogenic polymorphisms is often insufficient to establish mechanisms or to predict phenotypic outcomes, identifying causal mutations remains a key, but challenging, step for functional interpretation of cancer genomes53. Classic gene knockout or knockdown approaches cannot always resolve the diverse biological functions mediated by different mutations of the same gene54. Therefore, understanding how specific variants affect molecular interaction networks is crucial for interpreting complex genotype-to-phenotype relationships in cancer3,14,54.
Effect of genetic mutations on cancer networks. Protein products of mutated cancer genes do not function in isolation but rather are part of highly interconnected cellular networks3, which are often depicted as nodes (molecules) and edges (interactions)3,6,55 (Fig. 2c). In interaction networks, a genetic mutation can lead to either a complete gene knockout-like behaviour, as loss of all of its interactions in the network or, alternatively, as interaction perturbation ('edgetic'), leading to the loss or gain of specific interactions12 (Fig. 3a). Edgetic mutations tend to be located at interaction interfaces with protein partners. For example, the edgetic mutations R24C and R24H in cyclin-dependent kinase 4 (CDK4), which are associated with melanoma, reside at the protein interaction interface with the partner CDK inhibitor 2C (CDKN2C; also known as p18INK4C) (Fig. 3b). Similarly, the edgetic mutation F194S in fructose bisphosphatase 1 (FBP1) is located at the interaction interface (Fig. 3c).
Prioritization and comprehensive understanding of driver mutations requires the integration of systematic large-scale experimental approaches with computational algorithms. In the next two sections, we cover these aspects in detail.
Computational prediction of drivers
To predict driver mutations, computational systems biology has established and implemented modelling algorithms based on existing big data sets in cancer. In this section, we summarize recent computational methodologies for characterizing the function of cancer mutations from the network perspective and classify them into three main categories: node-level, subnetwork-level and edge-level predictions (Fig. 4; Table 1; see Supplementary information S1 (table)).
Node-centred effects of cancer mutations
Sequence features. One of the most commonly used methods for estimating the functional effect of mutations is sequence comparison based on multiple alignment. These methods assume that amino acid substitutions in highly conserved positions are deleterious (for example, SIFT56,57 and SIFT4G58). Some algorithms also incorporate sequence-based data with clinical data for inferring the relationships between mutations and the affected genes44. In addition, drivers can be identified by finding genes that harbour significantly more mutations than expected by chance59,60, such as MutSig45 and MSEA61. Together, although these methods are informative in driver prioritization, their accuracy may be influenced by empirically observed local mutation frequencies. The available large-scale WES or WGS data sets from several human genome projects, such as the 100,000 genomes project62 and the Icelandic genome project63, may provide valuable information for building background mutation models. However, although these algorithms have reasonable predictive value for loss-of-function aberrations, they are not very accurate in predicting gain-of-function or edgetic aberrations.
Structural features. Computational tools have been designed by integrating structural features with sequence information (for example, PolyPhen-2 (Ref. 64) and STRUM65). Some methods rely on the evidence that many driver mutations recurrently occur in specific structural regions of proteins (for example, protein domains and disordered regions)2,66,67,68,69 or disrupt the active sites (for example, phosphorylation sites)70,71,72. Recent methods map genetic mutations onto protein three-dimensional structures to evaluate the functional effect of mutations at high resolution73,74,75. In addition, some algorithms integrate multiple evolutionary and structural features to evaluate the disease-causing potential of mutations, such as CanDrA76 and MutationTaster77. Last but not least, other methods prioritize driver mutations on the basis of their location at the structural binding sites for small molecules (for example, CanBind78 and SGDriver79). Notably, most of these approaches predict mutational effects on the function of coding genes.
Regulatory features. The Encyclopedia of DNA Elements (ENCODE) project80 has provided a comprehensive map of regulatory elements by advanced techniques such as chromatin immunoprecipitation followed by sequencing (ChIP-seq), DNase-seq and chromosome conformation capture. Several computational methods to investigate the regulatory effects of cancer mutations have been proposed on the basis of various regulatory features in ENCODE (for example, ANNOVAR81,82). To score the deleterious consequences of genetic variants, some tools integrate a wide range of annotations (including genomic and epigenomic features) into one metric (for example, CADD83, GWAVA84, FitCons85 and deltaSVM86). Although these methods help to prioritize driver mutations, they often neglect the chromatin structural context of these regulatory regions. To overcome this problem, other algorithms were developed for predicting driver variants through the integration of chromatin effects87 and high-dimensional regulatory interactions88. Together, these predictive methods have established a useful toolkit to explore the function of cancer mutations in regulatory regions and their effect on the interactions with target genes.
Significantly mutated subnetworks or pathways
Although the above-mentioned computational methods have reached a reasonable level of accuracy for predicting loss-of-function aberrations, multiple lines of evidence have shown that integration of gene ontology and network features is crucial in providing a holistic measure of the functional consequence of a cancer mutation. Some algorithms assess the functional effect of cancer mutations taking into account the observation that genes with distinct ontology terms possess different baseline tolerance for deleterious mutations89. To further enhance the predictive power, other methods prioritize cancer-causing mutations by integrating sequence and structural features with gene ontology similarity (for example, CanPredict90). Recently, the rapid accumulation of protein interaction network data has provided a new basis for studying the topological features of cancer genes and mutations in cellular networks. It has been shown that cancer genes tend to possess high topological centrality, even higher than that found in essential genes. Several predictive algorithms combine network centrality with large-scale genomic resources for prioritizing variants in cancer (for example, SuSPect91 and FunSeq2 (Ref. 92)). It has been shown that network centrality helps to discriminate between disease-associated and tolerated mutations.
An important observation from analyses of the landscape of cancer mutations is that different tumours or patients with cancer show distinctly different mutational profiles. Furthermore, mutated genes tend to fall into a limited number of recurrently mutated subnetworks or pathways. This observation has stimulated systems-level approaches for detecting possible driver mutations and integrative analyses to identify significantly altered pathways. Using network approaches, several algorithms prioritize mutations in cancer on the basis of their effects on transcriptional output (for example, DIGGIT93) or their links to dysregulated genes from gene expression data (for example, DriverNet94, TieDIE95 and OncoIMPACT96). These algorithms are informative in identifying network modules that are related to downstream transcriptional changes induced by cancer mutations. In addition, some methods identify cancer or subtype-related subnetworks by diffusing cancer mutations throughout a network based on network propagation process (for example, VarWalker97, HotNet98, NBS13 and HotNet2 (Ref. 99)).
Edgetic effects of cancer mutations
In this section, we discuss the computational methods to functionally characterize the edgetic effects of cancer mutations, especially in the context of protein–protein, transcription factor–gene and microRNA (miRNA)–gene interaction networks.
Protein–protein interaction context. The prediction of the effect of a cancer mutation on protein–protein binding can be used to identify driver mutations. Some methods predict deleterious mutations on the basis of the mutation-induced changes in binding free energy (BeAtMuSiC100) or in their 3D protein complex context (Structure-PPi101). Other methods also evaluate the effects of mutations on protein interactions on the basis of force fields and statistical potentials and fast side-chain optimization algorithms (for example, MutaBind102). More globally, several algorithms (for example, dSysMap103) predict drivers by mapping missense mutations onto the structurally annotated human interactome from Interactome3D104, which is a valuable resource for exploring the edgetic role of disease mutations. A disadvantage of these methods is that they rely on known three-dimensional structures that are only available for a small proportion of proteins. By design, most studies of edgetic mutations have focused on loss of interaction; however, it is likely that cancer mutations could also result in a gain of interaction105. These edgetic effects are as yet poorly predicted by current algorithms and require both development of new algorithms and iterative improvement of these algorithms using experimental data71,72.
Gene regulatory context. Edgetic modelling of cancer mutations is not limited to protein interactions but can be applied to transcription regulatory networks in different contexts106,107 and in non-coding regions11,12. Many mutations identified by genome-wide association studies (GWAS) are likely to be regulatory single nucleotide polymorphisms (SNPs) that affect the ability of a transcription factor to bind to DNA. On the basis of this hypothesis, some algorithms score mutant alleles with a position weight matrix (PWM) to detect disruptive transcription factor mutations (for example, is-rSNP108). Additional tools annotate cancer drivers by calculating the change of a binding site caused by genetic mutations (for example, HaploReg109 and OncoCis110). Moreover, biophysical modelling of protein–DNA interactions helps to predict SNPs that cause considerable changes in the binding affinity of transcription factors (for example, BayesPI-BAR111). Finally, the complex miRNA–gene regulatory networks have been shown to control many key cellular processes that are dysregulated in cancers112,113. Several databases have been constructed for compiling mutations that are predicted to perturb miRNA-mediated gene regulation, such as Patrocles114, SomamiR115 and PolymiRTS116,117,118. Together, these available tools are invaluable in predicting a large number of cancer-associated regulatory mutations in signalling networks.
Systems-level experimental platforms
Functional analysis of cancer genes and mutations is key to understanding tumorigenic mechanisms and to developing therapeutic methods. Advances in large-scale experimental platforms and screens have revolutionized our ability to study cancer mutations and have begun to reveal the functional networks of cancer mutations. In this section, we focus on recent advances in functional characterization of coding mutations on a large scale.
Transcriptome profiles altered by mutations. Transcriptome profiling has been extensively used in the past decade for functional genomics40,119. In human cancer, gene expression profiles in tumours are often compared with those in the matched control tissues. RNA sequencing (RNA-seq) is one of the most common approaches for transcriptomic studies. In RNA-seq, total RNA is extracted from mutant or control samples, followed by reverse transcription to generate a cDNA library (Fig. 5a). After adding adaptors, RNA-seq samples can then be processed with next-generation sequencing. Computational algorithms are available to facilitate downstream RNA-seq data analysis. Recently, another transcriptomic platform has emerged: the library of integrated network-based cellular signatures (LINCS) L1000 (Ref. 120). LINCS L1000 can profile gene expression changes following genetic perturbation (mutations) of cell lines at high throughput. L1000 detects transcript abundance with optically addressed microspheres and a flow cytometric system (Fig. 5b). As a result, L1000 can directly measure a reduced representation of the transcriptome caused by a genetic mutation. Taken together, these techniques are robust in their assessment of global RNA expression levels for a given genetic background.
Proteome profiles altered by mutations. To monitor protein expression levels in mutants on a large scale, antibody-based platforms for targeted or global quantification of protein expression have been developed. Reverse-phase protein array (RPPA) technology (Fig. 5c) is a type of protein microarray that uses antibodies to detect relative expression levels of proteins in tissue or cell lysates from hundreds of samples simultaneously. This has been applied to a large number of mutation-bearing tumour samples from patients with cancer. A stringent antibody validation procedure must be in place to ensure the sensitivity, specificity and robustness of the platform. In addition, RPPA enables protein profiling with a small amount of sample and is a cost-effective technology. At present, most RPPA data sets usually include 150–300 antibodies that measure total proteins or specific modifications, such as phosphorylation, cleavage and fatty acid alteration.
As a result of these technical advantages, a number of studies have used the RPPA platform to analyse the expression of proteins involved in cell cycle progression, apoptosis, signalling network activities and other key pathways that are associated with specific mutation variants. Furthermore, RPPA has proved to be an efficient approach to assess functions of rare mutations in a high-throughput manner and to identify driver mutations. A recent study121 characterizing phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit-α (PIK3CA) mutations using RPPA showed that, even though the effect of its hotspot mutations is well-established, differential oncogenic activity and variant-specific activation of PI3K signalling and other pathways, such as the mitogen-activated protein kinase (MAPK) pathway, have been observed across lower frequency mutations. RPPA has also been used to characterize mutations in PIK3 regulatory subunit 1 (PIK3R1), showing that mutations can have marked edgetic effects through interrupting protein–protein interactions (PPIs) and also through gaining new interaction partners122,123,124. These studies suggest mutation-specific targeting as a potentially more efficient approach in precision medicine.
Affinity purification coupled with mass spectrometry (AP–MS) enables the detection of protein expression and PPIs in near-physiological conditions. After affinity purification using an antibody against the bait protein, mass spectrometry is then used to provide global and targeted profiling of protein expression (or modification) (Fig. 5d). Following normalization, the peak intensity patterns are analysed and compared across multiple samples125. To study interaction changes by cancer mutations, AP–MS has been applied to characterize changes in PPIs by melanoma-associated mutations in human CDK4 (Ref. 125). Various quantitative techniques have been applied to MS-based proteomic analysis, including label-free, metabolic labelling (for example, stable isotope labelling with amino acids in cell culture (SILAC)) and chemical labelling (for example, isobaric tag for relative and absolute quantification (iTRAQ) or tandem mass tag (TMT)). Recent MS-based studies using iTRAQ labelling have shown the ability to identify 8,000–11,000 proteins and 25,000 phosphosites per tumour on average126,127. However, these approaches require a large amount of input material, remain time consuming and are costly. Although restricted by the number of available antibodies, RPPA is a robust approach to evaluate protein expression levels under different conditions or in distinct cell types. By contrast, AP–MS provides a complementary assessment of protein expression, which is global but less specific.
Protein–protein interaction changes induced by mutations. Genetic mutations can impair protein interactions. Mechanistic understanding of cancer-associated mutations requires finding the molecular interactions and biochemical activities that these mutations perturb. For example, the cancer-associated C305F mutation in the zinc finger domain of MDM2 causes the loss of its binding to ribosomal proteins. This interaction perturbation disrupts the ribosomal stress response, contributing to tumorigenesis128. In addition, the missense mutation R172H of p53 leads to a gain of interaction with the tumour suppressor disabled homologue 2-interacting protein (DAB2IP), which inhibits DAB2IP function and increases the invasive behaviour in cancer cells105. Although some cancer mutations have been characterized, the functional mechanism behind most variants remains elusive4,7. Network approaches using interactome maps have been successful for highlighting candidate cancer genes and disease modifier genes9,129; however, the effect of most causal variants on interaction networks remains mainly unknown.
Recently, several large-scale functional variomics and proteomics platforms have been applied to profile mutation-induced changes of molecular interactions, especially PPIs, relative to their wild-type counterparts. The high-throughput Gateway-compatible enhanced yeast two-hybrid130 (HT-eY2H) system (Fig. 5e) and the protein fragment complementation assay (PCA)131 (Fig. 5f) have been implemented to detect PPI alterations. In these systems, the pair of protein partners is each fused to a fragment of a transcription factor or an enzyme, which is stable but inactive in isolation, whereas PPIs reconstitute the transcription factor or enzyme function. A recent investigation of genetic variant-specific effects on PPIs on a large scale across diverse human diseases including cancer identified that, in comparison with non-disease polymorphisms, disease mutations were more likely to associate with interaction perturbations12.
Protein–DNA interaction changes by mutations. WGS has revealed abundant genetic variation affecting not only coding sequences, but also non-coding regulatory elements. For example, many point mutations in the transcription factor runt-related transcription factor 1 (RUNX1) cause defective DNA binding, resulting in a familial platelet disorder that predisposes individuals to acute myeloid leukaemia132. Frequent mutations detected in the TERT promoter create de novo binding motifs for ETS transcription factors and play a central role in cancer-specific telomerase activation31,32. Although protein–DNA interactions have been characterized in a few cases, it remains unclear how the majority of transcription factor mutations or non-coding DNA mutations affect their interactions.
ChIP-seq assays have been extensively used to map genome-wide transcription factor occupancy profiles (Fig. 5g), such as the ENCODE project133,134. In addition, several systems biology platforms have emerged to study protein–DNA interactions on a large scale11,107,135,136, such as enhanced yeast one-hybrid (eY1H) and protein-binding microarrays (PBMs). In the eY1H assay, a putative regulatory DNA sequence is used as bait to search for transcription factors that bind to that DNA sequence in yeast cells137. A reporter gene is often placed downstream of the DNA sequence to assess protein–DNA interactions (Fig. 5h). Using these assays, a systematic study on the binding of ~1,000 human transcription factors to a large number of enhancer mutations found widespread protein–DNA interaction perturbations in disease, which correlate well with target gene expression changes11. Finally, large-scale transcription factor-binding activity could be evaluated using PBMs107 (Fig. 5i). A recent systematic study investigated transcription factor variants for their DNA-binding affinity using PBMs and identified that individuals with distinct mutations have unique transcription factor DNA-binding profiles, which may contribute to phenotypic variation107.
Pleiotropic mutational effects and integrative analysis. In cancer, mutations occur in the context of genomic, transcriptomic and/or epigenetic aberrations. To gain a systems-level understanding of the functional effects of mutations, an integrative analysis of multi-omics data sets (such as gene expression, DNA copy number and DNA methylation) is crucial. Several approaches have been proposed to address this direction. XSeq138 analyses the effect of somatic mutations by incorporating gene expression, patient mutations and a gene interaction network. PARADIGM-SHIFT139 is another example that infers mutated gene activity from gene expression and copy number in the context of genetic pathways. Integration of proteomics analysis with genomic data has enabled the detection of global proteomic patterns associated with potential driver genetic lesions. For example, a study of lung adenocarcinoma cell lines integrated differential protein expression data with distinct p53 mutational status and identified an enrichment of key functional pathways, including epithelial adhesion, immune and stromal cells and mitochondrial function140. Given that genomic mutations often act in a cell type-specific and condition-dependent manner, such integrative modelling is more likely to resolve functional effects of mutations by controlling for other changes.
Functional validation by CRISPR in cancer. CRISPR has emerged as a powerful and flexible tool141,142,143,144 to systematically interrogate cancer genomes. Together with other technologies, CRISPR has produced valuable data on the identification of new cancer genes as well as on the functional consequence of driver mutations. In the most widely used approach for CRISPR-based genome editing, a CRISPR-associated (Cas) nuclease, usually Staphylococcus pyogenes Cas9, is guided to a genomic target site by single guide RNAs (sgRNAs), where it creates DNA double-strand breaks (DSBs)145. DSBs are typically repaired by non-homologous end-joining (NHEJ), leaving a random sequence scar of a small insertion or deletion (indel) that can inactivate the targeted gene of interest146,147 (Fig. 5j). Current genome-wide CRISPR screen libraries contain 7 × 104–2 × 105 sgRNAs, with 3–12 sgRNAs for each gene148,149,150,151,152,153,154,155. Current CRISPR screens are useful for revealing gene-level information, but they lack the resolution to distinguish different mutations within a gene.
To precisely model specific cancer mutations by CRISPR, a DNA template (single stranded or double stranded) of homology is provided to convert the site of DSB into a desirable sequence in a process known as homology-directed repair (HDR) (Fig. 5j). However, HDR is relatively inefficient compared with NHEJ and can be further corrupted by indels. Some recent efforts have been made to improve the applicability of HDR in specific mutation editing156,157,158,159. Synchronizing the expression of Cas9 with cell cycle progression160 or treating cells with two small molecules, L755507 and brefeldin A161, could improve the HDR efficiency. The discovery of the smaller size Cas9 from Staphylococcus aureus allows the packaging of Cas9 and sgRNA expression constructs into the highly versatile adeno-associated virus (AAV) delivery vehicle162, enabling efficient HDR in specific organs such as the liver163.
Recently, direct base change has been achieved by fusing nuclease-dead Cas9 (dCas9) with a cytidine deaminase164,165,166,167, such as the activation-induced cytidine deaminase (AID), and rat-origin APOBEC1. These fusion proteins drive somatic hypermutation in locations close to the CRISPR target, thus creating genetic variants at a defined genomic locus without creating DNA breaks. The efficiency of base editing can be further improved by a second fusion to a bacteriophage uracil glycosylase inhibitor (UGI) and restoration of the nicking activity of dCas9, resulting in a third-generation base editor (BE3, APOBEC–XTEN–dCas9(A840H)–UGI) that mediates C into T conversion with up to 37% efficiency167. Taken together, the ability to induce specific mutations by CRISPR does provide an emerging and powerful tool for the analysis of functional consequences of candidate aberrations in genes in a low-throughput manner. However, despite constant advances made in the CRISPR technology, it remains inefficient for precise editing of genome sequences, making it challenging to apply to modelling the greater complexity edgetic effects of cancer mutations.
In this Review, we summarize computational and high-throughput experimental strategies to systematically characterize cancer genetic mutations in the context of molecular interaction networks at base-pair resolution. Recent advances in systems biology and next-generation sequencing have facilitated the development of functional variomics platforms to evaluate the effect of a large number of disease mutations. Widespread protein–protein and protein–DNA interaction perturbations have been identified across various types of human disease including cancer. It has been shown that different mutations in the same gene frequently result in different interaction perturbation profiles. Taken together, interaction perturbation profiling of disease mutations provides a paradigm for dissecting heterogeneous genetic variants across populations of patients with cancer, which is sorely needed given the large number of uncharacterized patient mutations and their potential effect on cancer phenotype and therapeutic liabilities.
During tumour evolution, mutations arise and accumulate in response to stress signals from the microenvironment or from tumour therapy. During this process, a driver mutation occurs and confers on the tumour a selective advantage. Although other, passenger, mutations do occur, they do not provide any growth advantage. Genetic heterogeneity develops across diverse tumour populations over time. Numerous computational algorithms and tools have been developed to prioritize cancer mutations based on different node- or edge-level functional properties. Although integrative computational analyses achieve some levels of accuracy in predicting disease-causing candidate mutations, a substantial fraction of the top hits should still be experimentally validated. The rapidly increasing functional annotation of cancer-specific mutations from variomics platforms offers the opportunity to iteratively improve the computational predictive tools based on high-quality test data. Furthermore, computational algorithms are often limited in predicting interaction perturbations or gain of function, which are common mutational effects on cancer signalling networks.
Systematic characterization of cancer variants for their effect on interaction networks is crucial to a systems-level understanding of genetic heterogeneity. So far we have focused on heterogeneity across tumour samples; however, intra-tumour heterogeneity also exists. A tumour is made up of many cell types, each of which would have its own set of mutations and underlying networks. Interaction profiles of cancer mutations provide a fundamental link between genotype and phenotype. Network perturbation by mutations facilitates grouping of distinct genotypes that share common effects on interaction profiles underlying a particular phenotype. In addition, the identified perturbed interaction partners allow us to uncover specific targets that are impaired in a mutation-specific context, which may in turn suggest therapeutic biomarkers guiding potential personalized precision medicine. However, the phenotypic diversity of cell types remains an important challenge for network biology: available network databases might not faithfully represent the particular cancer cell types to be investigated; furthermore, cell type composition and infiltration by stromal and immune system cells might affect network wiring.
Cell type specificity and tumour microenvironment need to be taken into account to obtain a comprehensive understanding of cancer mutations. Integration of context-dependent computational resources and improved algorithms would help to deconvolute the functional effects of mutations in a cell type-specific manner. In addition, emerging technologies such as single-cell experimental approaches could help further stratify mutational effects. To construct higher resolution functional networks, it would be essential to incorporate multiple properties of genetic mutations, including gene expression, protein folding and structure, protein–protein and protein–DNA interactions and beyond.
Protein Data Bank
N.S. would like to acknowledge the following grants: the Cancer Prevention and Research Institute of Texas (CPRIT) New Investigator Grant RR160021, the University of Texas System Rising STARs award, the US National Institutes of Health (NIH)–National Cancer Institute (NCI) grants P30CA016672, U54HG008100 and U01CA168394, and the University Center Foundation via the Institutional Research Grant program at the University of Texas MD Anderson Cancer Center.
Node centered computational methods to characterize the function of cancer mutations.
- Missense mutations
(Also known as non-synonymous mutations). Nucleotide mutations in exons of protein-coding genes that cause amino acid substitutions in the protein.
- Frame-shift mutations
Nucleotide mutations in exons of protein-coding genes that cause an alteration to the reading frame of translation and usually result in a premature stop codon and a truncated or non-expressed protein. They typically involve small insertions or deletions of a number of nucleotides that is not divisible by three.
- Silent mutations
(Also known as synonymous mutations). Nucleotide mutations in exons of protein-coding genes that do not alter the coded amino acid (due to degeneracy in the genetic code).
- Nonsense mutations
Nucleotide mutations in exons of protein-coding genes that change amino acid-encoding codons into stop codons.
(Chromatin immunoprecipitation followed by sequencing). Antibody-based immunoprecipitation of a chromatin-associated protein, such as a transcription factor (often epitope tagged) and its potentially interacting crosslinked DNA fragments, followed by sequencing to reveal the identity of these DNA fragments. Overall, this approach reveals the genomic sites of occupancy of the protein of interest.
Genome-wide sequencing of open chromatin regions that are sensitive to cleavage by DNase I. Open chromatin is enriched for regulatory sequences.
- Chromosome conformation capture
A method that analyses the spatial organization of chromatin in a cell by quantifying the interactions between genomic loci that are in proximity in three-dimensional space.
- Gene ontology
A unified representation of attributes for genes and gene products across species, which helps functional interpretation of experimental data.
- Topological centrality
In molecular interaction networks, topological centrality is an intrinsic network property that measures the overall position and 'connectedness' of a node in the networks.
Creating a single-strand DNA break.
About this article
Nature Communications (2018)