a, Left, boostDM scores of rare experimentally validated oncogenic and benign variants13,14 and absolute performance of boostDM models (precision, recall and F50). Right, performance of specific models. In parentheses, sizes of positive and negative sets of mutations. b, Performance of boostDM models (precision, recall and F50) in the discrimination between pathogenic somatic and benign germline variants (ClinVar). c, Distribution of predicted drivers across polymorphisms with different allele frequencies across the population. The bars represent the effect size of a logistic regression of the categorical frequency of the polymorphisms on their boostDM classification. Bars with positive effect size represent genes (or pooled cancer genes, in red, with regression P value indicated) across which very rare polymorphisms have an increased likelihood of being classified as drivers by boostDM models. The P value of the logistic regression corresponding to polymorphisms across all cancer genes is shown. Boxplots: centre line, median; box limits, first and third quartiles; whiskers, lowest/highest data points at first quartile minus/plus 1.5× IQR.