Extended Data Fig. 4: Extended nucleotide contexts contribute to the performance of the composite likelihood model. | Nature Genetics

Extended Data Fig. 4: Extended nucleotide contexts contribute to the performance of the composite likelihood model.

From: Identification of cancer driver genes based on nucleotide context

Extended Data Fig. 4

We examined whether accounting for extended contexts beyond trinucleotide contexts improved the fit of the composite likelihood model. To this end, we varied the number of nucleotides in the composite likelihood model between 0 (i.e. only substitution types) and 6 (i.e. 7-nucleotide contexts). We computed the residual sum of squared differences between observed mutation counts and the predictions of the composite likelihood model. As a negative control, we determined the residual sum of squares for a uniform distribution. This baseline was used to normalize the residual sum of squares for each cancer type. For some cancer types with ‘flat’ mutation signatures, nucleotide contexts only had minor impact on the fit of the model, but did not decrease the performance of the model (for example, lung adeno., n = 446 samples). For other cancer types, the fit of the model largely depended on the trinucleotide context, but not on the extended nucleotide context (e.g., prostate cancer, n = 880). For most cancer types with high background mutation rates, the fit of the composite likelihood model strongly depended on the extended nucleotide context (e.g., bladder, n = 317; breast, n = 1443; cervical, n = 192; colorectal, n = 223; endometrial cancer, n = 327; melanoma, n = 582).

Back to article page