Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A practical framework and online tool for mutational signature analyses show intertissue variation and driver dependencies

An Author Correction to this article was published on 17 June 2020

This article has been updated

Abstract

Mutational signatures are patterns of mutations that arise during tumorigenesis. We present an enhanced, practical framework for mutational signature analyses. Applying these methods to 3,107 whole-genome-sequenced (WGS) primary cancers of 21 organs reveals known signatures and nine previously undescribed rearrangement signatures. We highlight interorgan variability of signatures and present a way of visualizing that diversity, reinforcing our findings in an independent analysis of 3,096 WGS metastatic cancers. Signatures with a high level of genomic instability are dependent on TP53 dysregulation. We illustrate how uncertainty in mutational signature identification and assignment to samples affects tumor classification, reinforcing that using multiple orthogonal mutational signature data is not only beneficial, but is also essential for accurate tumor stratification. Finally, we present a reference web-based tool for cancer and experimentally generated mutational signatures, called Signal (https://signal.mutationalsignatures.com), that also supports performing mutational signature analyses.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Optimization of signature extraction framework.
Fig. 2: Relationships between organ-specific mutational signatures.
Fig. 3: Relating rearrangement mutational signatures obtained from independent organ-wise extractions.
Fig. 4: Comparison of organ-specific signatures in two cohorts.
Fig. 5: Fit of mutational signatures per sample.
Fig. 6: Signature-driver associations.
Fig. 7: Prediction of HRD across 21 organs.
Fig. 8: Analysis of samples with high mutational burden of RefSig 3 and overview of the signature analysis framework.

Data availability

Previously published WGS data that were reanalyzed here are available under accession codes EGAS00001001178 (a dataset of 560 breast cancers), EGAD00001002740 (a dataset of 80 breast cancers) and EGAS00001001692 (ICGC PCAWG). WGS data from the Hartwig cohort can be accessed from www.hartwigmedicalfoundation.nl/en. Signature networks are available as an interactive visualization at the web link https://signal.mutationalsignatures.com/explore/cancer/network.

Numerical source data for Fig. 1 and Extended Data Figs. 5a–h and 10 have been provided as Source Data file 1. Numerical source data for Figs. 28 and Extended Data Figs. 19 can be found in Supplementary Tables 1113. All other data supporting the findings of this study are available from the corresponding author on reasonable request.

Code availability

Signature extraction and fit code is freely available as an R package at https://github.com/Nik-Zainal-Group/signature.tools.lib. Additional R scripts used to perform the analysis are available as Supplementary Code, also available on github at the address https://github.com/Nik-Zainal-Group/DegasperiEtAl-NatureCancer2020-SupplCode.

Change history

  • 17 June 2020

    An amendment to this paper has been published and can be accessed via a link at the top of the paper.

References

  1. 1.

    Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 (2007).

    CAS  Article  Google Scholar 

  2. 2.

    Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

    CAS  Article  Google Scholar 

  3. 3.

    Van Hoeck, A., Tjoonk, N. H., van Boxtel, R. & Cuppen, E. Portrait of a cancer: mutational signature analyses for cancer diagnostics. BMC Cancer 19, 457 (2019).

    Article  Google Scholar 

  4. 4.

    Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).

    CAS  Article  Google Scholar 

  5. 5.

    Baez-Ortega, A. & Gori, K. Computational approaches for discovery of mutational signatures in cancer. Brief Bioinform. 20, 77–88 (2019).

    CAS  Article  Google Scholar 

  6. 6.

    Kim, J. et al. Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors. Nat. Genet. 48, 600–606 (2016).

    CAS  Article  Google Scholar 

  7. 7.

    Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999).

    CAS  Article  Google Scholar 

  8. 8.

    Lee, D. D. & Seung, H. S. Algorithms for non-negative matrix factorization. Adv. Neural Inform. Proc. Syst. 13, 556–562 (2001).

    Google Scholar 

  9. 9.

    Pascual-Montano, A., Carazo, J. M., Kochi, K., Lehmann, D. & Pascual-Marqui, R. D. Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Trans. Pattern Anal. Mach. Intell. 28, 403–415 (2006).

    Article  Google Scholar 

  10. 10.

    Campbell, P. J., Getz, G., Stuart, J. M., Korbel, J. O. & Stein, L. D. Pan-cancer analysis of whole genomes. Preprint at bioRxiv https://doi.org/10.1101/162784 (2017).

  11. 11.

    Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).

    CAS  Article  Google Scholar 

  12. 12.

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

    CAS  Article  Google Scholar 

  13. 13.

    Alexandrov, L. et al. The repertoire of mutational signatures in human cancer. Preprint at bioRxiv https://doi.org/10.1101/322859 (2018).

  14. 14.

    Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).

    CAS  Article  Google Scholar 

  15. 15.

    Huang, X., Wojtowicz, D. & Przytycka, T. M. Detecting presence of mutational signatures in cancer with confidence. Bioinformatics 34, 330–337 (2017).

    Article  Google Scholar 

  16. 16.

    Sabarinathan, R. et al. The whole-genome panorama of cancer drivers. Preprint at bioRxiv https://doi.org/10.1101/190330 (2017).

  17. 17.

    Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    CAS  Article  Google Scholar 

  18. 18.

    Popova, T. et al. Ovarian cancers harboring inactivating mutations in CDK12 display a distinct genomic instability pattern characterized by large tandem duplications. Cancer Res. 76, 1882–1891 (2016).

    CAS  Article  Google Scholar 

  19. 19.

    Willis, N. A. et al. Mechanism of tandem duplication formation in BRCA1-mutant cells. Nature 551, 590–595 (2017).

    CAS  Article  Google Scholar 

  20. 20.

    Polak, P. et al. A mutational signature reveals alterations underlying deficient homologous recombination repair in breast cancer. Nat. Genet. 49, 1476–1486 (2017).

    CAS  Article  Google Scholar 

  21. 21.

    Morganella, S. et al. The topography of mutational processes in breast cancer genomes. Nat. Commun. 7, 11383 (2016).

    CAS  Article  Google Scholar 

  22. 22.

    Glodzik, D. et al. A somatic-mutational process recurrently duplicates germline susceptibility loci and tissue-specific super-enhancers in breast cancers. Nat. Genet. 49, 341–348 (2017).

    CAS  Article  Google Scholar 

  23. 23.

    Bertucci, F. et al. Genomic characterization of metastatic breast cancers. Nature 569, 560–564 (2019).

    CAS  Article  Google Scholar 

  24. 24.

    Rheinbay, E. et al. Discovery and characterization of coding and non-coding driver mutations in more than 2,500 whole cancer genomes. Preprint at bioRxiv https://doi.org/10.1101/237313 (2017).

  25. 25.

    Ferreira, A. M. et al. High frequency of RPL22 mutations in microsatellite-unstable colorectal and endometrial tumors. Hum. Mutat. 35, 1442–1445 (2014).

    CAS  Article  Google Scholar 

  26. 26.

    Henderson, S., Chakravarthy, A., Su, X., Boshoff, C. & Fenton, T. R. APOBEC-mediated cytosine deamination links PIK3CA helical domain mutations to human papillomavirus-driven tumor development. Cell Rep. 7, 1833–1841 (2014).

    CAS  Article  Google Scholar 

  27. 27.

    Li, Z. et al. Loss of the FAT1 tumor suppressor promotes resistance to CDK4/6 inhibitors via the hippo pathway. Cancer Cell 34, 893–905 e898 (2018).

    Article  Google Scholar 

  28. 28.

    Zhao, E. Y. et al. Homologous recombination deficiency and platinum-based therapy outcomes in advanced breast cancer. Clin. Cancer Res. 23, 7521–7530 (2017).

    CAS  Article  Google Scholar 

  29. 29.

    Staaf, J. et al. Whole-genome sequencing of triple-negative breast cancers in a population-based clinical study. Nat. Med. 25, 1526–1533 (2019).

    CAS  Article  Google Scholar 

  30. 30.

    Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836 e816 (2019).

    CAS  Article  Google Scholar 

  31. 31.

    Gaujoux, R. & Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinf. 11, 367 (2010).

    Article  Google Scholar 

  32. 32.

    Wagstaff, K. et al. in Proc. Eighteenth International Conference on Machine Learning (eds., Brodley, C.E. and Pohoreckyj Danyluk, A.) 577–584 (Morgan Kaufmann Publishers, 2001).

  33. 33.

    Martello, S. & Toth, P. in North-Holland Mathematics Studies (eds., Martello, S., Laporte, G., Minoux, M. & Ribeiro, C.) 132, 259–282 (Elsevier, 1987).

  34. 34.

    Abkevich, V. et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Brit. J. Cancer 107, 1776–1782 (2012).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This work was funded by a Cancer Research UK (CRUK) Pioneer Award (grant no. C60100/A23433), Wellcome-Beit Prize, CRUK Advanced Clinician Scientist Fellowship (grant no. C60100/A23916), Wellcome Trust Strategic Award (WT100126/B/13/Z) and CRUK PRECISION Grand Challenge award. We thank J. Zamora, Y. Xu, D. R. Jones, R. Harris and S. P. Jackson, for their support in the development of the SIGNAL website. We would also like to thank the International Cancer Genome Consortium for access to WGS primary cancer data. This work has been facilitated by Hartwig Medical Foundation (HMF) and the Center for Personalized Cancer Treatment (CPCT), which have generated and made available metastatic whole cancer genome data for this research.

Author information

Affiliations

Authors

Contributions

S.N.Z. conceived the project. A.D. designed and performed the analysis, and developed the algorithms. A.D. and S.N.Z. interpreted the results and wrote the manuscript. S.N.Z., H.D., G.K., C.B., S.E.M. and J.Y. critically assessed the biological soundness of methods and results. D.G., X.Z., T.D.A., A.S.N., S.M., S.S., J.C., I.G.S., Y.M. and J.M.L.D. contributed to algorithm development and testing. A.D. and Y.M. implemented the algorithms in an R package. S.S. and J.C. developed the Signal web tool, implemented part of the analysis framework online and contributed to the manuscript.

Corresponding author

Correspondence to Serena Nik-Zainal.

Ethics declarations

Competing interests

S.N.Z., D.G. and H.D. are inventors on a patent application on HRDetect. All other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Optimal relative tolerance for the selection of best NMF runs.

(a-f) Repeated NMF application to the Breast 560 dataset (560 patient samples and 1000 NMF runs) of SNV organized in 96 channels. a, Distribution of the optimal Kullback-Leibler divergence (KLD) obtained from 1000 NMF runs (n = 1000) and for different number of k mutational signatures extracted (k from 9 to 13). Red vertical lines indicate the best (lowest) KLD, the 0.1% relative tolerance (RTOL) from best and the 1% RTOL from best. b, Convergence of global minimum for different k values. The 1000 values of optimal KLD in (a) are randomly ordered 50 times and the minimum KLD after each run is computed for each ordered sequence. Average (solid lines) and standard deviation (dotted lines) are then plotted. Red horizontal lines indicate the best KLD and 0.1% RTOL from best. c, The same KLD values from the five plots in (a) are combined in one single plot. (d-f) PCA plots of mutational signatures obtained from the Breast 560 catalogue, with number of signatures k = 10. In each row, three plots show principal components (PC) 1 with 2, 1 with 3 and 2 with 3, using the same projection of the first row. Colors indicate clusters computed with the clustering with matching algorithm, triangles are the medoids of the clusters and on top of the triangles the most similar COSMIC signatures (or sum of signatures), according to cosine similarity, are indicated. A black line connects the two closest medoids according to cosine similarity. The cosine similarity of the two closest medoids (max cos sim of medoids) and the average silhouette width (ASW) are indicated for each row. d, PCA plot obtained using 1000 NMF runs (n = 1000). e, PCA plot obtained using only the NMF runs within 1% RTOL from the best run. f, PCA plot obtained using only the NMF runs within 0.1% RTOL from the best run. The 1000 NMF runs used in this plot are the same as in panel (a) (k = 10). (g-h) Repeated NMF application to the Breast 560 dataset and additional PCAWG datasets. ASW of clustering mutational signatures from best NMF runs for different values of relative tolerance (g) or different number of total NMF runs (h). g, For each of the six datasets and for different number of mutational signatures (n sig), multiple NMF runs are performed (1000 for Breast 560 and 500 for the others). A relative tolerance (RTOL) with respect to the best (lowest) optimization function value obtained is used to select a subset of best NMF runs, that is all runs with optimization function value less or equal to best*(1 + RTOL). For each selected set of best runs, the obtained signatures are clustered using clustering with matching, and the ASW is computed. The six plots show the value of the ASW for different values of RTOL and number of signatures extracted (n sig). h, For each of the six datasets and for different number of mutational signatures (n sig), multiple NMF runs are performed (plot x axis). A relative tolerance (RTOL = 0.1%) with respect to the best (lowest) optimization function value obtained is used to select a subset of best NMF runs, that is all runs with optimization function value less or equal to best*(1.001). For each selected set of best runs, the obtained signatures are clustered using clustering with matching, and the ASW is computed. The six plots show the average of n = 10 replicates of the ASW for different number of total NMF runs performed and number of signatures extracted (n sig). Detailed data for the analyses shown in this figure can be found in Supplementary Tables 1 and 112.

Extended Data Fig. 2 Global Signature Extraction.

24 substitution signatures obtained from a signature extraction pooling 2,486 tumours across 21 organs. Several artefactual signatures are present among these signatures (details of the analysis can be found in Supplementary Notes). Detailed data for the analyses shown in this figure can be found in Supplementary Table 112.

Extended Data Fig. 3 Substitution signatures extracted per organ from 3107 samples, part 1.

First part of 192 organ-specific substitution signatures, obtained across 21 organs (n = 3107 samples). Signatures names have letters or numbers (between parentheses), where numbers indicate the associated reference signatures as defined by the conversion matrix (Methods and Supplementary Table 9). Detailed data for the analyses shown in this figure can be found in Supplementary Table 2 and 112.

Extended Data Fig. 4 Substitution signatures extracted per organ from 3107 samples, part 2.

Second part of 192 organ-specific substitution signatures, obtained across 21 organs (n = 3107 samples). Signatures names have letters or numbers (between parentheses), where numbers indicate the associated reference signatures as defined by the conversion matrix (Methods and Supplementary Table 9). Detailed data for the analyses shown in this figure can be found in Supplementary Table 2 and 112.

Extended Data Fig. 5 Rearrangement signatures extracted per organ from 3107 samples, part 1.

First part of 116 organ-specific substitution signatures, obtained across 19 organs (n = 3021 samples). Signatures names have letters or numbers (between parentheses), where numbers indicate the associated reference signatures as defined by the conversion matrix (Methods and Supplementary Table 10). Detailed data for the analyses shown in this figure can be found in Supplementary Table 3 and 113.

Extended Data Fig. 6 Rearrangement signatures extracted per organ from 3107 samples, part 2.

Second part of 116 organ-specific substitution signatures, obtained across 19 organs (n = 3021 samples). Signatures names have letters or numbers (between parentheses), where numbers indicate the associated reference signatures as defined by the conversion matrix (Methods and Supplementary Table 10). Detailed data for the analyses shown in this figure can be found in Supplementary Table 3 and 113.

Extended Data Fig. 7 Hierarchical clustering of substitution mutational signatures obtained from organ-wise independent extraction.

a, Clustering of organ-wise extracted substitution signatures, using hierarchical clustering with average linkage and 1 – cosine similarity as distance metric. Red boxes indicate the identified signature groups and the corresponding group reference signature is indicated at the bottom. b, Reference signatures for each group in (a), given as the mean and standard error of the signatures in each group. Cosine similarity with the most similar COSMIC signature is indicated, along with the number of signatures in the group (n) and the reference signature name. c, Cosine similarity between the reference signature of each group and the individual signatures that belong to each group. Group sizes are the same as in panel (b). Boxes show median, 1st and 3rd quartile, with whiskers extending at most 1.5·IQR. Detailed data for the analyses shown in this figure can be found in Supplementary Table 2 and 4.

Extended Data Fig. 8 Hierarchical clustering of rearrangement mutational signatures obtained from organ-wise independent extraction.

a, Clustering of organ-wise extracted rearrangement signatures, using hierarchical clustering with average linkage and 1 – cosine similarity as distance metric. Red boxes indicate the identified signature groups and the corresponding group reference signature is indicated at the bottom. b, Reference signatures for each group in (a), given as the mean and standard error of the signatures in each group. Cosine similarity with the most similar rearrangement signature from Nik-Zainal et al. 2016 is indicated, along with the number of signatures in the group (n) and the reference signature name. c, Cosine similarity between the reference signature of each group and the individual signatures that belong to each group. Group sizes are the same as in panel (b). Boxes show median, 1st and 3rd quartile, with whiskers extending at most 1.5·IQR.

Extended Data Fig. 9 Similarity between substitution reference signatures and single base substitution COSMIC signatures.

Cosine Similarity between the substitution reference signature (Extended Data Fig. 7b) and the COSMIC single base substitution signatures (SBS) was computed. Red squares indicate a cosine similarity higher than 0.9; blue squares indicated a cosine similarity between 0.85 and 0.9; white squares indicate a cosine similarity lower than 0.85. Detailed data for the analyses shown in this figure can be found in Supplementary Table 4.

Extended Data Fig. 10 Simulated experiments, mutational signature assignment to samples.

A total of 30 mutational catalogues were simulated (n = 5 replicate simulated datasets) using 10 COSMIC signatures. Mutational signatures were extracted using different approaches, and assigned to the 30 samples using a bootstrap signature fit approach. Each sample was bootstrapped 100 times and each time signature activity in the sample was estimated by optimizing the Kullback-Leibler Divergence (KLD). A consensus activity is then computed for each sample as the median of the results, and then the sparsity of the activity is increased by setting to zero activities that are not statistically higher than a given threshold (threshold = 0, 1, 2, 5, 10 percent of total number of mutations and p-value 0.05, that is set to 0 if more than 5% of the runs is below the threshold). The correct number of signatures k = 10 is used. Filter: only the best NMF runs are considered; no filter: all NMF runs are used; Lee KLD: Lee and Seung 2001 multiplicative algorithm with Kullback-Leibler Divergence (KLD); Lee–Frobenius: Lee and Seung 2001 with Frobenius norm; nsNMF: non-smooth NMF with KLD; HC: hierarchical clustering with average linkage; CM: clustering with matching; PAM: partitioning around the medoids. a, Root mean squared error (RMSE) between original mutation assignment matrix and the fitted model. b, Sensitivity of signature assignment. c, Specificity of signature assignment. Detailed data for the analyses shown in this figure can be found in Source Data File 1.

Supplementary information

Supplementary Information

Supplementary Notes and Figs. 1–3.

Reporting Summary

Supplementary Tables

Tables containing the list of samples (Table 1), organ-specific mutational signatures (Tables 2 and 3), the reference signatures (Tables 4 and 5), the exposures of the organ-specific signatures in each sample (Tables 13–52), the conversion matrices used to convert organ-specific signature exposures into reference signature exposures (Tables 9 and 10), the reference signature exposures (Tables 6 and 7) and the HRDetect data and scores (Table 8). Supplementary Tables 53–110 contain the results of the drivers-signatures associations analysis, including sample sizes and results of two-sided Fisher exact tests. Supplementary Table 111 contains the organ-specific signatures extracted from the Hartwig metastatic cohort. Supplementary Tables 112 and 113 contain the full list of ICGC-PCAWG substitution and rearrangement catalogs.

Supplementary Software

R script files with the code used to perform the analysis in the article.

Source data

Source Data Fig. 1

Simulated mutational catalogs and signature exposures.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Degasperi, A., Amarante, T.D., Czarnecki, J. et al. A practical framework and online tool for mutational signature analyses show intertissue variation and driver dependencies. Nat Cancer 1, 249–263 (2020). https://doi.org/10.1038/s43018-020-0027-5

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing