A mutational signature in gastric cancer suggests therapeutic strategies

Targeting defects in the DNA repair machinery of neoplastic cells, for example, those due to inactivating BRCA1 and/or BRCA2 mutations, has been used for developing new therapies in certain types of breast, ovarian and pancreatic cancers. Recently, a mutational signature was associated with failure of double-strand DNA break repair by homologous recombination based on its high mutational burden in samples harbouring BRCA1 or BRCA2 mutations. In pancreatic cancer, all responders to platinum therapy exhibit this mutational signature including a sample that lacked any defects in BRCA1 or BRCA2. Here, we examine 10,250 cancer genomes across 36 types of cancer and demonstrate that, in addition to breast, ovarian and pancreatic cancers, gastric cancer is another cancer type that exhibits this mutational signature. Our results suggest that 7–12% of gastric cancers have defective double-strand DNA break repair by homologous recombination and may benefit from either platinum therapy or PARP inhibitors.

G ermline inactivating mutations in BRCA1 and/or BRCA2 cause an increased risk of early-onset breast 1,2 , ovarian 2,3 and pancreatic cancer 4 , while somatic mutations in these two genes and BRCA1 promoter hypermethylation have also been implicated in development of these cancer types 5,6 . BRCA1 and BRCA2 are involved in error-free homology directed doublestrand break repair 7 . Cancers with defects in BRCA1 and BRCA2 consequently show large numbers of rearrangements and indels due to error-prone repair by non-homologous end joining mechanisms, which assume responsibility for double-strand break repair 8,9 . While defective double-strand break repair increases the mutational burden of a cell, thus increasing the chances of acquiring somatic mutations that lead to neoplastic transformation, it also renders a cell more susceptible to cell cycle arrest and subsequent apoptosis when it is exposed to agents such as platinum-based antineoplastic drugs 10,11 . This susceptibility has been successfully leveraged for the development of targeted and less-toxic therapeutic strategies for treatment of breast, ovarian and pancreatic cancers harbouring BRCA1 and/or BRCA2 mutations, notably poly(adenosine diphosphate ribose) polymerase (PARP) inhibitors 10,11 . These treatments cause a multitude of DNA double-strand breaks that force neoplastic cells with defective BRCA1 and BRCA2 function into apoptosis since they lack the ability to effectively repair double-strand breaks. In contrast, normal cells remain mostly unaffected since their repair machinery is not compromised.
Exposure to exogenous or endogenous mutagens, abnormal DNA editing, the incomplete fidelity of DNA polymerases and failure of DNA repair mechanisms generate distinct combinations of somatic mutation types in cancer genomes [12][13][14] . We previously termed such patterns 'mutational signatures' and developed an approach for extracting mutational signatures from cancer genomes 15 . Our previous analysis revealed 21 distinct basesubstitution signatures across the spectrum of human cancer 12 . These base-substitution mutational signatures were described using a simple classification based on the six classes of single-base mutations: C4A, C4G, C4T, T4A, T4C and T4G (all substitutions are referred to by the pyrimidine of the mutated Watson-Crick base pair) in combination with the base immediately 5 0 and 3 0 to each mutation, thus resulting in 96 possible mutation types. Our previous analysis 12 revealed that cancers harbouring germline and/or somatic mutations in BRCA1 and BRCA2 exhibited a specific base-substitution signature, termed signature 3. The mutational pattern of signature 3 is shown in Fig. 1a. Interestingly, although almost all breast, ovarian and pancreatic cancers with BRCA1/2 mutations have large numbers of signature 3 mutations, a number of cancer cases lacking mutations in BRCA1 and BRCA2 or other genes known to play a role in double-strand break repair also exhibit the mutational signature 12 . This observation hinted towards the existence of other mechanisms that may be disabling homology directed double-strand DNA break repair.
A recent clinical analysis of pancreatic whole-genome sequencing data revealed that all samples responding to platinum therapy exhibited substantial numbers of signature 3 mutations 16 . This therapeutic response was also observed in a sample that lacked any germline or somatic BRCA1 or BRCA2 mutations, indicating that signature 3 itself could be used for decision support in allocating these therapies, even in the absence of BRCA1 or BRCA2 mutations.
In this study, we report a large-scale mutational signatures analysis aiming to identify the presence of signature 3 across human neoplasia. Our results reveal that, in addition to previously known cancer types, signature 3 is also present in 7-12% of gastric cancers. These gastric cancers most likely have defective homology directed double-strand DNA break repair and may benefit from either platinum therapy or PARP inhibitors.

Results
Large-scale survey of signature 3 across human neoplasia. We used a substantially elaborated version of our previously developed framework for deciphering mutational signatures (Methods) and analysed 7,329,860 somatic mutations from 10,250 pairs of cancer-normal samples derived from 36 distinct types of human cancer, including 607 whole-genome sequences and 9,643 whole-exome sequences (Supplementary Data 1). As expected, signature 3 was found in ovarian, breast and pancreatic cancers ( Fig. 1b; Supplementary Data 1). In ovarian cancer 143 of the examined 466 ovarian whole exomes (B30.7% of ovarian samples) exhibited signature 3. In breast cancer, signature 3 was found in 283 of the 1,051 whole-exome breast cancer sequences (B26.9%) and in 35 of the 119 whole-genome sequences (B29.4%). Whole-genome sequencing of 15 pancreatic cancers deliberately enriched for cases with BRCA1/2 mutations revealed the presence of signature 3 in six samples (40.0%), while examination of an unbiased set of 216 whole-exome sequenced pancreatic cancers identified signature 3 in 16 cases (B7.41%).
Remarkably, despite surveying another 33 distinct cancer types derived from diverse epithelial, mesenchymal, glial, haematopoietic and lymphoid cells, signature 3 was observed only in gastric cancer. The examined data for gastric cancer included 372 whole-exome and 100 whole-genome sequences (Supplementary Data 1). These data were derived from four independent previously published studies [17][18][19][20] . We were able to detect signature 3 in 27 whole exomes (B7.3% of the examined whole-exome gastric samples) and in 12 whole-genomes (12.0% of the examined whole-genome gastric samples). Some gastric samples harboured BRCA1 or BRCA2 somatic mutations, but there was no enrichment of signature 3 in samples with BRCA1/2 mutations. Most of these BRCA1/2 mutations were heterozygous and were found in cases with a very high prevalence of small indels and base substitutions due to defective DNA mismatch repair and are, therefore, highly likely to be passenger mutations. The contributions of all mutational signatures operative in the examined set of gastric cancers are provided in Supplementary Data 2, while the mutational signatures in the whole-genomesequenced samples are shown in Fig. 2. In addition, the genomic profiles of two gastric samples harbouring signature 3 (one sample with a BRCA2 mutation and another without a mutation in either BRCA1 or BRCA2) are shown in Fig. 3.
Patterns of indels and structural rearrangements. The presence of signature 3 mutations (and thus failure of DNA double-strand repair by homologus recombination) is closely associated with a particular pattern of large indels (longer than three base pairs) with overlapping microhomology at the deletion break points in breast, ovarian and pancreatic cancers. This pattern provides additional evidence of the absence of homologous recombinationbased repair and the role of non-homologous end-joining mechanisms. To evaluate further the significance of finding signature 3 mutations in gastric cancer, we searched for indels of this type. Whole-genome sequenced gastric cancers harbouring signature 3 had a median number of indels with overlapping microhomologies at break points of 715 compared with a median number of only 172 such indels in samples in which there was no evidence for signature 3 (Supplementary Data 3; Mann-Whitney U-test's P value ¼ 1.07 Â 10 À 5 ). Similarly, gastric whole-exomes exhibited a statistically significant elevation of large indels with overlapping microhomologies in samples in which signature 3 was found to be operative (Supplementary Data 4; Mann-Whitney U-test's P value ¼ 5.87 Â 10 À 4 ).
Breast, ovarian and pancreatic cancers with BRCA1 or BRCA2 mutations and/or signature 3 mutations show larger numbers of structural rearrangements than cases without, consistent with a deficiency in error-free double-strand break repair. We therefore compared the numbers of rearrangements in gastric cancers with and without signature 3 mutations. Whole-genome sequenced samples harbouring signature 3 had on average 244 structural variants versus 111 in samples that did not exhibit signature 3 ( Fig. 2; Supplementary Data 3; Mann-Whitney U-test's P value ¼ 1.24 Â 10 À 3 ). Thus, gastric cancers with signature 3 mutations bear other mutational hallmarks of failure of homology directed double-strand break repair, despite the absence of inactivating mutations in BRCA1 and BRCA2 genes.
Association between signature 3 and gastric cancer histology. Examining the histology of gastric samples with signature 3 revealed that they are enriched for the intestinal type by Lauren's classification (Table 1; Fisher's exact test's P value ¼ 5.80 Â 10 À 3 ) and have a tendency to display a distinctive compact discohesive growth pattern that looks like 'growth in cell suspension' (Fig. 3b,c; Table 1; Fisher's exact test's P value ¼ 3.00 Â 10 À 4 ). This pattern is characterized by solid nests of roundish malignant cells with marked loss in cell-to-cell adhesion (Fig. 3b,c). These nests are crowded together, which distinguishes them from the diffuse-type gastric cancer with widely infiltrative growth behaviour.
Presence of signature 3 in gastric cell lines. Our analysis of 10,250 primary cancers was also complemented by examination of the generally available set of gastric cell lines to provide a suitable model for testing the drug susceptibility of stomach cancers harbouring signature 3. In total, we examined wholeexome sequences of 20 gastric cell lines (Supplementary Data 5). Unfortunately, we were not able to identify a cell line in which signature 3 was present. This is perhaps unsurprising since the

Discussion
The results of this study provide the first comprehensive largescale survey of mutational signature 3 across human cancer. It should be noted that most of the analysed 10,250 samples were whole-exome sequenced (94% of samples) and it is possible that our survey was not able to detect the presence of signature 3 in cancer types and samples in which the signature generates low numbers of somatic mutations. Nevertheless, our analysis demonstrates that signature 3 is present in B7% of whole-exome sequenced gastric cancers as well as 12% of whole-genome sequenced gastric cancers. The results indicate that stomach cancers with signature 3 mutations may have defective homology directed double-strand DNA break repair. Only very limited clinical data were available for the examined gastric cancers restricting our opportunities for exploring correlations between the presence of signature 3 with disease progressions and outcome following treatment. Future studies, involving larger cohorts of samples with carefully curated clinically based data coupled with detailed histological data and complimented by functional analysis, will be necessary to further elaborate the connection between signature 3 and clinical response in gastric cancer.
Gastric cancer is the second most common cause of cancer-related deaths worldwide 21 . Since cancers with defective homology directed double-strand break repair due to BRCA1 or BRCA2 mutations are particularly sensitive to platinum therapy and PARP inhibitors, it is conceivable that this subset of gastric cancers might also benefit from their usage. Current gastric cancer chemotherapy protocols are variable and may include 5-fluorouracil-based therapy only or in combination with platinum-based drugs and other agents. Future precision medicine clinical studies with a focus on patient selection will be required to evaluate whether the presence of signature 3 substitutions, and the features of indels and rearrangements associated with it, might allow better patient selection for platinum-based drugs, and whether targeted therapies such as PARP inhibitors based on defective DNA double-strand break repair would also benefit patients with gastric cancer.

Methods
Curation of freely available somatic mutations of cancer samples. No data were generated specifically for the uses of this study. Rather, a large-scale data curation was performed with the goal of annotating the majority of freely available cancer genomes. Somatic mutations identified in 10,250 genome pairs (consisting of a cancer genome and the genome of a matched normal tissue) were curated. The curated data encompass 36 distinct types of cancer. In all, 607 of the 10,250 matched-normal pairs had their whole-genome sequenced, while the remaining 9,643 were subjected to whole-exome sequencing. Data were retrieved from three main sources: (i) the data portal of The Cancer Genome Atlas, (ii) the data portal of the International Cancer Genome Consortium and (iii) previously published data in peer-reviewed journals. Information for each sample, including its original data source, is provided in Supplementary Data 1. The somatic mutations for all examined samples are freely available and can be retrieved based on the information provided in Supplementary Data 1.
Filtering of somatic mutations and generating mutational catalogues. This study relies on previously sequenced samples from cancer and normal tissues, as well as from the subsequently used bioinformatics analyses to identify cancer tissue-specific somatic mutations. The analysed sequencing data were originally generated by a variety of different laboratories, leveraging different sequencing platforms and using a diverse set of mutation-calling algorithms. To remove any residual germline mutations as well as technology-, institute-and/or laboratoryspecific sequencing artefacts, extensive filtering was performed before analysing the data. Germline mutations were filtered out from the lists of reported somatic mutations using the complete list of germline mutations from dbSNP 22 , 1000 genomes project 23 , NHLBI GO Exome Sequencing Project 24 and 69 Complete Genomics panel (http://www.completegenomics.com/public-data/69-Genomes/). Technology-specific sequencing artefacts were filtered out by using panels of BAM files of (unmatched) normal tissues containing more than 250 normal whole genomes and 500 normal whole exomes. Any somatic mutation present in at least two well-mapping reads in at least two normal BAM files was discarded. The remaining somatic mutations constituted the mutational catalogue for every matched-normal pair. The immediate 5 0 and 3 0 sequence context for each somatic mutation was extracted using the ENSEMBL Core APIs for human genome build GRCh37. Curated somatic mutations that originally mapped to an older version of the human genome were re-mapped using UCSC's freely available lift genome annotations tool (https://genome.htseq.org/Bplantregulome/cgi-bin/hgLiftOver) 25 . Any somatic mutations with ambiguous or missing mappings were discarded from further analysis. The prevalence of somatic mutations in each sample was estimated based on a haploid human genome after all filtering was performed as previously done in ref. 12.
Estimating the contributions of mutational signatures in each sample. The mutational catalogues of all 10,250 samples were examined in two independent and distinct steps. Initially, de novo extraction based on somatic substitutions and their immediate sequence context was performed to derive the set of novel consensus mutational signatures. Briefly, mutational signatures were deciphered independently for each of the 36 cancer types using our previously developed computational MATLAB framework 15 . The computational framework for deciphering mutational signatures is freely available (Supplementary Software 1) and it can be also downloaded from: http://www.mathworks.com/matlabcentral/ fileexchange/38724. The algorithm deciphers the minimal set of mutational signatures that optimally explains the proportion of each mutation type found in each catalogue and then estimates the contribution of each signature to each mutational catalogue. Mutational signatures were also extracted separately for genomes and exomes. Mutational signatures extracted from exomes were normalized from the observed trinucleotide frequency in the human exome to the trinucleotide frequency of the human genome. All mutational signatures were clustered using unsupervised agglomerative hierarchical clustering and a threshold was selected to identify the set of consensus mutational signatures. Misclustering of signatures was avoided as previously described in ref. 12. A curated list of cancer census mutational signatures and their presence in human cancer can be found at our website: http://cancer.sanger.ac.uk/cosmic/signatures.
The de novo extraction was used to identify the complete set of consensus mutational signatures across the examined 10,250 samples. The next step of the analysis focused on accurately estimating the numbers of somatic mutations associated with each mutational signature in each sample. We usually refer to this The subset Q is determined based on the known operative mutational processes in the cancer type of the examined sample from the mutational signature extraction process described above. For example, for any neuroblastoma sample, Q will contain signatures 1, 5 and 18, as these are the only known signatures of mutational processes operative in neuroblastoma 12 . In equation (1) (1) is performed under several biologically meaningful linear constraints. The set of vectors in the examined set Q is constrained based on previously identified biological features of the consensus mutational signatures. For example, consensus signature 6 causes high levels of small insertions and/or deletions (indels) at mono/polynucleotide repeats 12 . Thus, this mutational signature will be excluded from the set Q when the mutational catalogue of an examined sample has only a few such indels. Similarly, there are signatures associated with other types of indels, transcriptional strand bias, dinucleotide mutations, hypermutator phenotypes and so on, and these signatures are included in the set Q only when the sample in question exhibits one or more of these features. Lists of features associated with different mutational signatures can be found in ref. 12 as well as at our website: http://cancer.sanger.ac.uk/cosmic/ signatures. In addition to sample-specific constraints to the set Q, equation (1) was universally constrained in regards to the parameter E i . More specifically, the number of somatic mutations contributed by a mutational signature in a sample must be non-negative and it must not exceed the total number of somatic mutations in that sample. Furthermore, the mutations contributed by all signatures in a sample must equal the total number of somatic mutations of that sample. These constraints can be mathematically expressed as 0 E i kS i ! k 1 ; i ¼ 1::q, and P q i¼1 E i ¼k S i ! k 1 . The results for the contributions of mutational signature 3 in all 10,250 samples from the hitherto described approach are provided in Supplementary Data 1.
Factors that influence extraction of mutational signatures. We have previously used results from simulated data to describe a plethora of factors that influence the accuracy of the extraction of mutational signatures 15 . Such factors include the number of available samples, the number of somatic mutations in a sample, the number of mutations contributed by different mutational signatures, the similarity between the patterns of the signatures of mutational processes operative in cancer samples, as well as the computational limitations of our framework. Nevertheless, in the past 3 years, our framework has proven robust and has described multiple similar and validated signatures across the spectrum of human cancer 8,12,14,[26][27][28][29][30][31] .