a, BoostDM models can be used to interpret the mutations observed in a newly sequenced tumour genome. Classifications and their explanation (represented via radial plots) may be useful for an expert with other ancillary information. b, Number of models of each type (gene–tumour type-specific, gene-specific or gene–other tumour type) selected across tumour types in a pan-cancer cohort of about 28,000 tumours to classify mutations observed in cancer genes (numbers below the plot). At least one model is available to classify mutations across 2,080 cancer gene–tissue combinations. c, Fraction of all mutations in cancer genes (variants of unknown significance, or VUS) that are covered by either type of model across tumour types. Fractions for the entire cohort appear in the right-hand barplot. Specifically, 14,757 (26%) are classified by specific boostDM models, while a further 2,588 (4%) may be classified by models trained by pooling mutations and features of several related tumour types; 20% more may be classified by a model trained on a different tumour type. In the entire cohort, 28,080 VUSs (about 50%) are covered for interpretation by at least one boostDM model. These are compared to the fraction of VUSs covered by interpretation using two curated datasets of known oncogenic mutations (ClinVar and OncoKB). d, Comparison of the number of driver mutations identified by boostDM per sample across the cohort with the number of excess mutations (over the expectation provided under the hypothesis of neutrality) identified by a dNdS approach. The numbers identified are of the same order of magnitude as the numbers of driver mutations predicted by dNdScv. Points, median; bars, extending between 10th and 90th percentiles of the distribution. e, The interpretation of newly sequenced tumour genomes is implemented within the Cancer Genome Interpreter platform. VUSs not covered by boostDM models are interpreted following a simple rule-based approach (OncodriveMut).