Cytarabine (1-β-D-arabinofuranosylcytosine; Ara-C) is an effective remission induction treatment for acute myeloid leukemia (AML).1,2 However, the majority of patients relapse within 3 years, often with aggressive AML2 characterised by the gain of novel base substitution mutations in genes including IDH2 and TET2.3,4 Chemotherapy used to treat AML is predicted to induce mutations detectable in relapsed disease, although direct evidence for this is lacking. The cytotoxicity of Ara-C is mediated in part by incorporation into replicating DNA leading to inhibition of chain extension. However, even at concentrations that induce significant cell death, arabinofuranosylcytosine triphosphate, the reactive metabolite of Ara-C, is primarily incorporated into DNA at internucleotide positions,5,6 suggesting that this nucleoside analogue is likely mutagenic and could contribute to the aetiology of novel somatic mutations acquired during AML relapse.

To assess the mutagenicity of Ara-C, mutation frequency (Mf) was determined in TK6 cells in vitro at two widely used reporter genes, thymidine kinase (TK) and hypoxanthine-guanine phosphoribosyltransferase (HPRT). The procedure for performing in vitro quantitative mutation assays was developed by Liber and Thilly7 and involves treatment of cells under defined conditions, followed by incubation in non-selective medium for a time sufficient to allow phenotypic expression of the chosen genetic marker. Cells were then plated in the presence of selective agent to determine the number of mutant cells able to form colonies and in the absence of selective agent in order to determine the intrinsic cloning efficiency of the cells. Mf was calculated following adjustment for cloning efficiency and is described in detail in Supplementary Methods.

A significant Ara-C dose-dependant increase in Mf was observed in surviving cells at both loci, although Ara-C induced fewer mutations in surviving cells compared to the powerful mutagen MNU at equitoxic doses (Figure 1a). Ara-C-induced Mf was next determined in isogenic cells deficient in MSH2,8 a component of DNA mismatch repair (MMR) that when lost confers sensitivity to mutagenesis by agents that induce base:base mispairs.9 MSH2-deficient cells were more susceptible to Ara-C-induced mutation than their MMR-proficient counterparts (Figures 1a and b and Supplementary Figure S1), demonstrating a significant influence of both Ara-C dose and MMR status on Mf (P<0.001). These data confirm the mutagenicity of this anti-leukaemic cytosine analogue and suggest it to be a base substitution mutagen.

Figure 1
figure 1

Mutagenicity of Ara-C in vitro. Mf at TK and HPRT loci was assessed in DNA MMR-proficient TK6 lymphoblastoid cells (a) or isogenic MMR-deficient MSH2i cells (b) following either 16 h exposure to Ara-C (blue squares) or 4 h exposure to MNU (orange triangles). Data represents the mean and standard error of three independently treated cell populations and is presented as Mf observed in control vehicle-treated cells (0% cytotoxicity) and at drug doses causing upto 95% cytotoxicity (see Supplementary Figure. 1). Exponential curves are displayed for each dataset. The associated R2 values represent the square of the Pearson product moment correlation coefficient, indicating the ‘goodness of fit’ of the data. (c) spectrum of independent mutations observed at the HPRT locus following 16 h exposure to 30 nM Ara-C (inducing ~95% cytotoxicity) or vehicle control in B-lymphoblastoid TK6 cells. Coloured sections represent different structural classes of mutation observed. Numbers in parentheses indicate the proportion of mutants in each class. Total number of mutants analysed (n) is shown above the bars. P-value was calculated by Fisher’s exact test comparing the distribution of structural mutations (large deletions, small deletions, base substitutions and partial gene duplications) derived from Ara-C treated cells with those derived from vehicle-treated cells. (d and e) the sequence context of HPRT base substitution mutations from control vehicle-treated and Ara-C-treated cells (see Supplementary Table S2). Heatmaps display the number of base substitution mutations at the central position (d) or any position (e) of all possible trinucleotide sequences in vehicle- (top) or Ara-C-treated (bottom) cells. It was not possible to discern whether base substitution mutation was initiated due to a mis-insertion on the plus or minus strand. As such, for the purposes of this analysis, mutations in any given trinucleotide sequence (the mutated base and 1 base 5′ and 3′ of the mutated base) were grouped with mutations in their complement sequence. For vehicle-treated cells, the sequence context of 25 single base substitutions was analysed. For Ara-C treated cells, the sequence context of 25 single base substitutions and 1 tandem base substitution (scored at both positions) were analysed.

In order to determine whether Ara-C preferentially induces mutation at specific sequences in the genome, a mutation spectrum at the HPRT locus was generated. Cells were exposed to 30 nM Ara-C (or vehicle) for 16 hours and mutant colonies were generated as described by Liber and Thilly.7 A single mutant colony from each treated cell population was expanded in standard growth medium supplemented with 2 μg/ml 6-thioguanine and RNA/DNA extracted for subsequent characterisation (Supplementary Methods). This procedure was performed up to 150 times for Ara-C and vehicle, ensuring full independence of mutant colonies. The mutation spectra from Ara-C-treated and vehicle-treated cells were significantly different (P=0.002), with the proportion of base substitutions higher in Ara-C-treated cells (27 vs 19%; Figure 1c and Supplementary Table S1). Strikingly, 7 of 27 (25.9%) base substitution mutations from Ara-C-treated clones affected the central G:C base-pair in 5′TpGpA3′/5′TpCpA3′ sequences (Figure 1d and Supplementary Table S2), including 6 single base substitutions and 1 tandem base substitution at five different sites in the HPRT gene. Using Monte–Carlo simulation, the probability of 7 or more mutations at the central position of any one of the 32 possible trinucleotide sequences occurring by chance was calculated as 4.7 × 104. Mutation at the central position of 5′TpGpA3′/5′TpCpA3′ sequences was observed only once in vehicle-treated control cells (1/25, 4.0%) and none of the other 31 possible trinucleotide sequences were represented more than twice in the Ara-C mutation spectrum (Figure 1d).

Ara-C induces significant perturbation of the DNA helix which could cause mutation at neighbouring base positions.10 Consistent with this phenomenon, five additional base substitution mutations in Ara-C-treated cells were identified at non-central positions in 5′TpGpA3′/5′TpCpA3′ sequences, but none were observed in vehicle-treated control cells (Figure 1e and Supplementary Table S2). In total, 12 of 27 (44.4%) base substitution mutations in Ara-C-treated cells occurred in 5′TpGpA3′/5′TpCpA3′ sequences, compared to only 1 of 25 (4.0%) in control cells. Using Monte–Carlo simulation, the probability of 12 or more mutations at any of the three base positions in any one of the 32 possible trinucleotide sequences occurring by chance was calculated as 2.6 × 10−4. These data demonstrate that Ara-C preferentially induces mutation in 5′TpGpA3′/5′TpCpA3′ sequences, and particularly at the central G:C position consistent with its function as a cytosine analogue.

We next looked for mutation at 5′TpGpA3′/5′TpCpA3′ sequences in AML patients treated with Ara-C. Whole genome sequencing data from eight AML patients3 were analysed as described in Supplementary Methods. A total of 3587 and 452 somatic mutations were analysed from presentation and relapse disease, respectively. The frequency of mutation at the G:C position in 5′TpGpA3′/5′TpCpA3′ sequences was significantly higher in relapsed AML (22/452, 4.9%) compared to chemotherapy-naive presentation AML (111/3587, 3.1%; odds ratio (OR) 2.2, 95% confidence interval (CI) 1.4–3.6; P=0.003; Table 1 and Supplementary Table S3), consistent with an Ara-C-induced mutation signature. It should be noted that Ara-C makes a relatively small contribution to relapse mutation load and that other mechanisms might be more important for driving the acquisition of novel mutations in relapsed samples. Specifically, of the 452 novel base substitution mutations seen at relapse, the excess mutations at the G:C position in 5′TpGpA3′/5′TpCpA3′ sequences, and potentially attributable to Ara-C, constitute less than 2% of all novel relapse-specific mutations.

Table 1 Frequency of somatic mutation at the central base of trinucleotide sequences in AML patients at presentation (before Ara-C treatment) and relapse (post-Ara-C treatment)

In order to exclude the possibility that the specific acquisition of mutation at 5′TpGpA3′/5′TpCpA3′ sequences is intrinsic to disease progression, sequencing data were also analysed from three patients at diagnosis of myelodysplastic syndrome (MDS) who subsequently progressed to AML.11 These patients had not received any therapy other than supportive care for MDS (Supplementary Methods). The frequency of G:C mutations at 5′TpGpA3′/5′TpCpA3′ sequences was similar in MDS (34/1122; 3.0%) and presentation AML (13/428, 3.0%; OR 1.1, 95% CI 0.5–2.3, P=0.882; Supplementary Table S4), further implicating Ara-C as the cause of mutation at these sites in relapse AML.

It will also be important to establish the contribution of other remission induction agents to mutation in relapsing AML. In addition to 5′TpGpA3′/5′TpCpA3′, the frequency of mutation at three other trinucleotides was also significantly higher in relapsed AML compared to presentation disease (Table 1), suggesting the presence of mutation signatures from other chemotherapy agents.

In summary, these data demonstrate that Ara-C preferentially induces mutation at 5′TpGpA3′/5′TpCpA3′ sequences which are significantly elevated in relapse disease after exposure to Ara-C-containing regimens. Given the relationship between Ara-C dose and mutagenicity reported here, a consideration of chemotherapy-induced mutagenicity could be important when developing strategies for treating AML that maximise the likelihood of remission whilst minimising the risk of mutation in surviving cells which could contribute to evolution of relapse disease.