Age, smoking, viruses and radiation add to the burden of inherited and spontaneous mutations that influence cancer initiation and development, leaving their signatures on the cancer genome.

Main

Mutational landscape and significance across 12 major cancer types

Cyriac Kandoth, Michael McLellan et al.Nature 10.1038/nature12634

Mutation frequency, defined as the number of somatic mutations per Mb of sequence having sufficient coverage, was calculated for all 3,281 tumors. Figure 1a shows the distribution of mutation frequencies across the 12 cancer types.

A comparison of the mutation spectra across the 12 cancer types (Fig. 1b) reveals that lung tumors (LUSC and LUAD) contain higher proportions of C→A transversions. Such mutations are classical signatures of exposure to cigarette smoke. We also examined the mutation sequence context (defined as the proportion of nucleotides in the 4 flanking positions of each mutation site from −2 to +2) across all 12 cancer types. [...] Out of 6 types of transitions and transversions, the flanking sequence context for 4 of them (C→A, A→T, A→G, and A→C) show very high levels of correlation, suggesting very little difference in context.

The largest difference in flanking context is observed for C→T transitions and C→G transversions. For example, BLCA, BRCA, and HNSC have similar mutation contexts for the latter. The frequency of the T nucleotide 1 bp upstream of the mutation site is dramatically higher than those observed in other cancer types (Supplementary Fig. 1). GBM, AML, COAD/READ, and UCEC have similar mutation contexts for C→T transitions. The proportions of the G nucleotide 1 base downstream of the mutation site are 67%, 59%, 65%, 63%, respectively. These numbers are substantially higher than those observed in other cancer types, which hover around 40%. BLCA has a unique signature for C→T transitions compared to the other cancer types (enriched for TC) (Supplementary Fig. 3).

Signatures of mutational processes in human cancer

Ludmil Alexandrov et al.Nature 500, 515–521 (2013) 10.1038/nature12477

The catalogue of somatic mutations from a cancer genome bears the signatures of the mutational processes that have been operative. Here we analysed 4,938,362 mutations from 7,042 cancers and extracted more than 20 distinct mutational signatures.

There are signatures characterized by prominence of only one or two of the 96 possible substitution mutations, indicating remarkable specificity of mutation type and sequence context (signature 10). By contrast, others exhibit a more-or-less equal representation of all 96 mutations (signature 3). There are signatures characterized predominantly by C>T (signatures 1A/B, 6, 7, 11, 15, 19), C>A (4, 8, 18), T>C (5, 12, 16, 21) and T>G mutations (9, 17), with others showing distinctive combinations of mutation classes (2, 13, 14).

Signatures 1A and 1B were observed in 25 out of 30 cancer classes (Fig. 3).

An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers

Steven Roberts et al.Nature Genetics 45, 970–976 (2013) 10.1038/ng.2702

We show here that throughout cancer genomes APOBEC-mediated mutagenesis is pervasive and correlates with APOBEC mRNA levels. Mutation clusters in whole-genome and exome data sets conformed to the stringent criteria indicative of an APOBEC mutation pattern. Applying these criteria to 954,247 mutations in 2,680 exomes from 14 cancer types, mostly from The Cancer Genome Atlas (TCGA), showed a significant presence of the APOBEC mutation pattern in bladder, cervical, breast, head and neck, and lung cancers, reaching 68% of all mutations in some samples.

Indeed, the APOBEC mutation pattern was clearly present throughout many exomes, indicating that APOBEC enzymes were probably a major source of mutagenesis in these samples (Fig. 2a and Supplementary Table 4).

Our study and previous analyses suggest that the level of APOBEC3B transcription affects APOBEC-mediated mutagenesis. How higher APOBEC3B transcript levels are established remains unclear. Among the factors that could increase the amount of APOBEC protein(s) is the presence of the viral and retrotransposable elements that these enzymes restrict6,33. Such factors can stimulate APOBEC expression through a complex network of innate immunity signaling, involving components such as Toll-like receptors, interferons, interleukins and even the 'usual suspect' in carcinogenesis, the p53 protein34–37. Infection with several viruses38 as well as retrotransposition39 are associated with carcinogenesis; however, the mechanisms of this association are far from clear. A potential relationship between APOBEC-mediated mutagenesis and viral infection is appealing, as cervical and head and neck cancers, which are highly associated with human papillomavirus (HPV) infection, showed strong enrichment of APOBEC-mediated mutagenesis.

Evidence for APOEC3B mutagenesis in multiple human cancers

Michael Burns, Nuri Temiz & Reuben Harris Nature Genetics 45, 977–983 (2013) 10.1038/ng.2701

Enzyme-catalyzed DNA cytosine-to-uracil deamination is central to both adaptive and innate immune responses. B lymphocytes use activation-induced deaminase (AID) to create antibody diversity by inflicting uracil lesions in the variable regions of expressed immunoglobulin genes, which are ultimately processed into all six types of base-substitution mutations15,16 (somatic hypermutation). AID also creates uracil lesions in antibody gene switch regions that lead to DNA breaks and the juxtaposition of the expressed and often mutated variable region next to a new constant region (isotype switch recombination)15,16.

We therefore performed a global analysis of sequence signatures for all available cytosine mutation data from the top 50% of APOBEC3B-expressing tumors for each tumor type (this expression cutoff was chosen to minimize the impact of unrelated mutational mechanisms). These mutation data were first compiled and subjected to hierarchical cluster analysis to group tumors with similar cytosine mutation signatures (Fig. 3a).

We next separated each composite mutation distribution into the 16 individual local trinucleotide contexts to further resolve cytosine-focused mutational mechanisms that might influence each cancer. Bladder, cervical, lung squamous cell carcinoma, lung adenocarcinoma, head and neck, and breast cancers all shared strong bias for TCN mutation signatures (where N is any base), with the strongest bias for TCA of the four possibilities (Fig. 3b). A background of other mutations was apparent in the two types of lung cancer, possibly associated with tobacco carcinogens or other mutational mechanisms.

Analysis of viral sequences in cancer genomes

The landscape of viral expression and host gene fusion and adaptation in human cancer

Ka-Wei Tang, Babak Alaei-Mahabadi, Tore Samuelsson, Magnus Lindh & Erik Larsson Nature Communications 10.1038/ncomms3513

We used two complementary approaches to detect and quantify expression of known and novel viruses in tumours (Fig. 1a, Methods).

We identified 178 tumours with FVR (viral expression) >2 p.p.m., but found that most positive cases had considerably higher levels (on average 168 and up to 854 p.p.m.; the complete results are available in Supplementary Data 1). Expectedly, CESC and LIHC showed the highest proportion of virus-positive tumours (96.6% and 32.4%, respectively, >2 p.p.m.), followed by head and neck squamous cell carcinoma (HNSC, 14.8%; Fig. 1b).

The known tumour viruses HPV and HBV constituted the vast majority of strong signals >10 p.p.m. (90.5%; Fig. 1c).

Overall occurrences of HPV agreed closely with current knowledge: CESC showed 96.6% association with HPV, similar to recent large surveys13 (Fig. 2a).

Apart from matched normal liver samples with expected HBV (discussed below), only 2/404 normal tissue controls tested positive in this study, both with papillomavirus (Fig. 2a): one breast biopsy with low levels (3.1 p.p.m.) of a wart virus, HPV2, which expressed early as well as late genes indicative of active production of viral particles, and a normal kidney sample with HPV18 (12.9 p.p.m.), with viral gene expression similar to HPV in COAD/READ consistent with productive viral infection (Supplementary Fig. S2) but also with evidence of host–virus fusion (Fig. 2b).

figure 1

(a) Bean plot illustrating the distribution of mutation frequencies (defined as the number of mutations over the total number of bases having sufficient sequence coverage) across 12 cancer types. Dashed gray and solid white lines denote average mutation frequency across all cancer types and median mutation frequency for each cancer type, respectively. (b) Mutation spectrum showing the proportion of six transition and transversion categories for each cancer type.

figure 2

Cancer types are ordered alphabetically as columns whereas mutational signatures are displayed as rows. 'Other' indicates mutational signatures for which we were not able to perform validation or for which validation failed (Supplementary Figs 2428). Prevalence in cancer samples indicates the percentage of samples from our data set of 7,042 cancers in which the signature contributed significant number of somatic mutations. For most signatures, significant number of mutations in a sample is defined as more than 100 substitutions or more than 25% of all mutations in that sample. MMR, mismatch repair.

figure 3

(a,b) Fold enrichment (a) and mutation load (b) of the APOBEC mutation pattern were determined in each of 2,680 whole exome–sequenced tumors representing 14 cancer types. Samples were categorized by the statistical significance of the APOBEC mutation pattern and the magnitude of enrichment. The significance of the APOBEC mutation pattern was calculated by one-sided Fisher's exact test comparing the ratio of the number of C-to-T or C-to-G substitutions and complementary G-to-A or G-to-C substitutions that occur in and out of the APOBEC target motif (TCW or WGA) to an analogous ratio for all cytosines or guanines that reside inside and outside of the TCW or WGA motif within a sample fraction of the genome (Benjamini-Hochberg–corrected q value < 0.05). The number of tumor samples in each category is presented in each pie chart in a. Samples with q value > 0.05 are represented in black. These samples are excluded from the scatter graphs in a,b. Color scales indicate the magnitude of enrichment in a and the number of APOBEC signature mutations in b for samples with q < 0.05. Dashed lines indicate effects expected with random mutagenesis. Cancer types are abbreviated as in TCGA: cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), bladder urothelial carcinoma (BLCA), head and neck squamous cell carcinoma (HNSC), breast invasive carcinoma (BRCA), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), uterine corpus endometrioid carcinoma (UCEC), ovarian serous cystadenocarcinoma (OV), stomach adenocarcinoma (STAD), rectum adenocarcinoma (READ), colon adenocarcinoma (COAD), prostate adenocarcinoma (PRAD), kidney renal clear-cell carcinoma (KIRC) and acute myeloid leukemia (LAML).

figure 4

(a) Dendrogram with web logos indicating the relationship among cancer types determined by the trinucleotide contexts of mutations occurring at cytosine nucleotides for the top 50% of APOBEC3B-expressing samples in each cancer type. Font size of the bases at the 5′ and 3′ positions are proportional to their observed occurrence in exome mutation data sets. The preferred mutation context for recombinant APOBEC3B from ref. 33 is included in hierarchical clustering to determine how closely each cancer's actual mutation spectrum matches the preferred motif for APOBEC3B in vitro. The pattern expected if the mutations were to occur at random cytosine bases in the exome is included as an inset at the bottom left. (b) Stacked bars indicate the observed proportion of cytosine mutations at each unique trinucleotide (NCN to NTN, NGN or NAN). The top six cancer types (highlighted by the solid box) show clear biases toward mutations within TCN motifs, at frequencies that resemble the preference of recombinant APOBEC3B in vitro33. Skin cancer and the bottom seven cancers (highlighted by dashed boxes) have obviously different cytosine mutation spectra.

figure 5

(a) Analysis pipeline. Non-human reads were matched to a database of 3,590 RefSeq viral genomes, that was complemented with 12 additional known and 2 partial novel genomes detected by de novo assembly of viral reads. (b) Included cancer types and statistics. Bar graphs show fraction of tumours with strong viral expression (410 p.p.m. viral reads in library) as well as weaker detections (2–10 p.p.m.). (c) Relative numbers of positive tumours for major virus categories, with strong and weak detections shown separately.

figure 6

(a) RNA-seq-derived expression levels for 28 viruses (vertical axis) detected at >2 p.p.m of total library reads in at least one tumour, across 178 virus-positive tumours from 19 cancer types (horizontal axis). Viruses identified only because of sequence similarity with related strains were not included. (b) In addition to viral gene expression, genomic viral integration may have functional consequences. A large fraction of positive tumours identified in a carried viral integrations (top row), as evidenced by host-virus fusion transcripts in paired-end RNA-seq. Some genes showed recurrent integration in multiple tumours (six bottom rows). Integrations were quasi-randomly distributed across the genome (bottom chromosome plot) with some preferred loci. Select genes are shown for cytobands with recurrent integrations (number of tumours in parentheses). n/a, no paired-end data available.