Mutations in cancer-associated genes drive tumour outgrowth, but our knowledge of the timing of driver mutations and subsequent clonal dynamics is limited1,2,3. Here, using whole-genome sequencing of 1,013 clonal haematopoietic colonies from 12 patients with myeloproliferative neoplasms, we identified 580,133 somatic mutations to reconstruct haematopoietic phylogenies and determine clonal histories. Driver mutations were estimated to occur early in life, including the in utero period. JAK2V617F was estimated to have been acquired by 33 weeks of gestation to 10.8 years of age in 5 patients in whom JAK2V617F was the first event. DNMT3A mutations were acquired by 8 weeks of gestation to 7.6 years of age in 4 patients, and a PPM1D mutation was acquired by 5.8 years of age. Additional genomic events occurred before or following JAK2V617F acquisition and as independent clonal expansions. Sequential driver mutation acquisition was separated by decades across life, often outcompeting ancestral clones. The mean latency between JAK2V617F acquisition and diagnosis was 30 years (range 11–54 years). Estimated historical rates of clonal expansion varied substantially (3% to 190% per year), increased with additional driver mutations, and predicted latency to diagnosis. Our study suggests that early driver mutation acquisition and life-long growth and evolution underlie adult myeloproliferative neoplasms, raising opportunities for earlier intervention and a new model for cancer development.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Whole-genome sequencing data in the form of BAM files across all samples reported in this study have been deposited in the European Genome–Phenome Archive (https://www.ebi.ac.uk/ega/home) with accession codes EGAD00001007714 (whole-genome sequencing colonies) and EGAD00001007715 (targeted-recapture sequencing). Per patient VCF files containing information on somatic mutations identified are available on Mendeley (doi: 10.17632/hrmxybrd2n.1) .
Single-nucleotide substitutions (SNV) were called using the cancer variants through expectation maximization (CaVEMan) algorithm, version 1.13.14 (https://github.com/cancerit/CaVEMan). Small insertions and deletions were called using the Pindel algorithm as implemented in the cgpPindel workflow, version 3.2.0 (https://github.com/cancerit/cgpPindel). Copy number variants were called using the ASCAT algorithm as implemented in the ascatNgs workflow, version 3.2.0 (https://github.com/cancerit/ascatNgs). Mutational signatures analysis was performed using MutationalPatterns v1.10, available on Github (https://github.com/UMCUGenetics/MutationalPatterns) and SigProfiler (https://github.com/AlexandrovLab). Allele counts at SNV and indel sites were carried out using vafCorrect (https://github.com/cancerit/vafCorrect). Telomere lengths were estimated using telomerecat, version 3.2 (https://github.com/cancerit/telomerecat). Mutations were mapped to phylogenetic branches using Rtreemut developed for this study (https://github.com/NickWilliamsSanger/treemut). Temporal branch lengths and per driver mutation rates were inferred using rtreefit developed for this study (https://github.com/NickWilliamsSanger/rtreefit). Simulation of HSC populations and phylogenies with selection were carried out using rsimpop developed for this study (https://github.com/NickWilliamsSanger/rsimpop). Other analyses were carried out using custom R scripts available at https://github.com/NickWilliamsSanger/mpn_phylogenies_and_evolution.
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Mitchell, T. J. et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TRACERx Renal. Cell 173, 611–623.e17 (2018).
Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).
Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).
Radivoyevitch, T., Hlatky, L., Landaw, J. & Sachs, R. K. Quantitative modeling of chronic myeloid leukemia: insights from radiobiology. Blood 119, 4363–4371 (2012).
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019).
Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).
Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).
Abelson, S. et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature 559, 400–404 (2018).
Vainchenker, W. & Kralovics, R. Genetic basis and molecular pathophysiology of classical myeloproliferative neoplasms. Blood 129, 667–679 (2017).
Grinfeld, J. et al. Classification and personalized prognosis in myeloproliferative neoplasms. N. Engl. J. Med. 379, 1416–1430 (2018).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316.e4 (2018).
Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).
Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Preprint at https://doi.org/10.1101/2021.08.16.456475 (2021).
Nangalia, J. et al. DNMT3A mutations occur early or late in patients with myeloproliferative neoplasms and mutation order influences phenotype. Haematologica 100, 438–442 (2015).
Ortmann, C. A. et al. Effect of mutation order on myeloproliferative neoplasms. N. Engl. J. Med. 372, 601–612 (2015).
Lundberg, P. et al. Clonal evolution and clinical correlates of somatic mutations in myeloproliferative neoplasms. Blood 123, 2220–2228 (2014).
Godfrey, A. L. et al. JAK2V617F homozygosity arises commonly and recurrently in PV and ET, but PV is characterized by expansion of a dominant homozygous subclone. Blood 120, 2704–2707 (2012).
Kahn, J. D. et al. PPM1D-truncating mutations confer resistance to chemotherapy and sensitivity to PPM1D inhibition in hematopoietic cells. Blood 132, 1095–1105 (2018).
Vaziri, H. et al. Evidence for a mitotic clock in human hematopoietic stem cells: loss of telomeric DNA with age. Proc. Natl Acad. Sci. USA 91, 9857–9860 (1994).
Anand, S. et al. Effects of the JAK2 mutation on the hematopoietic stem and progenitor compartment in human myeloproliferative neoplasms. Blood 118, 177–181 (2011).
Chapman, M. S. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).
de Kanter, J. K. et al. Antiviral treatment causes a unique mutational signature in cancers of transplantation recipients. Cell Stem Cell 28, 1726–1739 (2021).
Titmarsh, G. J. et al. How common are myeloproliferative neoplasms? A systematic review and meta-analysis. Am. J. Hematol. 89, 581–587 (2014).
Mehta, J., Wang, H., Iqbal, S. U. & Mesa, R. Epidemiology of myeloproliferative neoplasms in the United States. Leuk. Lymphoma 55, 595–600 (2014).
Van Egeren, D. et al. Reconstructing the lineage histories and differentiation trajectories of individual cancer cells in myeloproliferative neoplasms. Cell Stem Cell 28, 514–523.e9 (2021).
McKerrell, T. et al. JAK2 V617F hematopoietic clones are present several years prior to MPN diagnosis and follow different expansion kinetics. Blood Adv. 1, 968–971 (2017).
Hirsch, P. et al. Clonal history of a cord blood donor cell leukemia with prenatal somatic JAK2 V617F mutation. Leukemia 30, 1756–1759 (2016).
Wong, W. H. et al. Engraftment of rare, pathogenic donor hematopoietic mutations in unrelated hematopoietic stem cell transplantation. Sci. Transl. Med. 12, eaax6249 (2020).
Olcaydu, D. et al. A common JAK2 haplotype confers susceptibility to myeloproliferative neoplasms. Nat. Genet. 41, 450–454 (2009).
Hinds, D. A. et al. Germ line variants predispose to both JAK2 V617F clonal hematopoiesis and myeloproliferative neoplasms. Blood 128, 1121–1128 (2016).
Fleischman, A. G. Inflammation as a driver of clonal evolution in myeloproliferative neoplasm. Mediators Inflamm. 2015, 606819 (2015).
Bick, A. G. et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature 586, 763–768 (2020).
Bao, E. L. et al. Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells. Nature 586, 769–775 (2020).
Watson, C. J. et al. The evolutionary dynamics and fitness landscape of clonal hematopoiesis. Science 367, 1449–1454 (2020).
Arber, D. A. et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood 127, 2391–2405 (2016).
Nielsen, C., Birgens, H. S., Nordestgaard, B. G. & Bojesen, S. E. Diagnostic value of JAK2 V617F somatic mutation for myeloproliferative cancer in 49 488 individuals from the general population. Br. J. Haematol. 160, 70–79 (2013).
Kiladjian, J. J. et al. Pegylated interferon-α-2a induces complete hematologic and molecular responses with low toxicity in polycythemia vera. Blood 112, 3065–3072 (2008).
Pieri, L. et al. JAK2V617F complete molecular remission in polycythemia vera/essential thrombocythemia patients treated with ruxolitinib. Blood 125, 3352–3353 (2015).
Nangalia, J. et al. Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2. N. Engl. J. Med. 369, 2391–2405 (2013).
Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2021).
Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinforma. 56, 15.10.1–15.10.18 (2016).
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).
Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).
Tavaré, S. The linear birthdeath process: An inferential retrospective. Adv. Appl. Probab. 50, 253–269 (2018).
We thank Cambridge Blood and Stem Cell Biobank, funded by the Cambridge Cancer Centre and Wellcome Trust Cambridge Stem Cell Institute, Wellcome Sanger CASM and DNA pipelines for their assistance; and S. Behjati and C. Harrison for valuable discussion. The study was supported by Cancer Research UK (J.N.), EHA Research Award (J.N.), MPN Research Foundation (J.N.) and the Wellcome Trust (P.J.C., A.R.G. and J.L.). Work in the A.R.G. laboratory is supported by the Wellcome Trust, Bloodwise, Cancer Research UK, the Kay Kendall Leukaemia Fund and the Leukaemia and Lymphoma Society of America. J.N. is a CRUK Clinician Scientist fellow. We thank the patients for their participation in the study.
A patent has been filed by the Wellcome Sanger Institute (inventors N.W. and J.N.; Application number PCT/EP2021/071952) covering somatic mutation identification in the context of tumour contamination of the matched germline sample.
Peer review information
Nature thanks Steven McCarroll, Seishi Ogawa and the other, anonymous reviewers for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 legend. Patient characteristics and somatic mutation fractions in haematopoietic colonies.
a. Patient characteristics. PV, Polycythemia vera; ET, Essential thrombocythaemia; MF, myelofibrosis; HC, Hydroxycarbamide; IFN, Interferon-alpha; FU, follow-up. *PV diagnosed on red cell mass study. b. The distribution of variant allele fractions (VAF) for point mutations pooled across colonies per patient. The mean VAF of individual colonies is shown as red dots. Only autosomal somatic mutations are shown, with those in regions with copy-number aberrations and loss-of-heterozygosity excluded. The plot shows that the colony VAFs are close to 0.5 for the majority.
a. Phylogenetic tree of PD5117 depicting 3 separate 9pUPD (UPD, uniparental disomy) acquisitions (blue branches), downstream of JAK2V617F (red branch). Below the phylogenetic tree are three B-allele frequency plots showing the regions of 9pUPD in the different clades with vertical red lines showing the boundary of loss of heterozygosity. The event shown on the far right has a distinct breakpoint from the left two events. Blue and green vertical lines show somatic mutations (either prior or subsequent to the UPD event), suggesting that the 9pUPD event depicted in the middle plot occurred first as more mutations have had time to accrue since the copy number aberration. b. Phylogenetic tree of PD5179 depicting two separate 1q+ (orange branches) and 9q- (blue branches) acquisitions. Left plot shows the aggregate VAF of germline single nucleotide polymorphisms (SNP) on Chr1 for samples in the 1q+ major clade versus 1q+ minor clade (left plot). SNPs at a VAF = 2/3 in one clade are at 1/3 in the minor clade, and vice-versa, confirming that different parental chromosomes are amplified in each clade. SNPs in the affected 9q- region also exhibit a clear pattern in VAF (right panel), with VAF = 0.5 for samples in the major 9q- clade but VAF = 0 or 1 for samples in the minor 9q- clade. A proposed model of chr9 copy number changes is shown in the upper right. c. Phylogenetic tree of PD4781 depicting two separate JAK2V617F acquisitions (red branches) each followed by 9pUPD (blue branches). JAK2V617F acquisition occurred on different parental alleles in each instance as SNPs on 9p that have a VAF ~1 for samples in the major JAK2-mutant clade (horizontal bar coloured red) have a VAF ~0 in samples from the minor JAK2-mutant clade (horizontal bar coloured blue) and vice-versa.
a. Mutation rate estimates for wildtype and different mutant clades within patients. Mutation acquisition is modelled using Poisson modelling taking into account the timing of transition from wildtype to driver mutation acquisition within mutant clades and an excess mutation rate earlier in life ( Methods). Patients and genotypes of clades are shown on the left together with colony number for each clade (N). Wildtype (WT) clades are shown in grey bars, JAK2-mutated clades are shown in red and other mutant clades are shown in yellow. The cohort wide estimate for the mutation rate in WT colonies is shown by the dotted black vertical line at the top. *P < 0.05, **P < 0.01 (** also significant after multiple hypothesis testing; Bonferonni adjusted, two-sided test). Significantly different mutation rates between clades are highlighted only for those significant by both Poisson and Negative Binomial modelling of mutation rates ( Methods). Average mutation burdens are shown to the right for the different timepoints of sampling. b. Non-parametric comparison of mutation burdens in wildtype versus mutant colonies using limma’s rankSumTestWithCorrelation. This accounts for the non-independence of data in mutant colonies but does not account for the timing of driver mutation acquisition. *indicates significance at P < 0.05 following Bonferonni multiple hypothesis correction.
a. Signature contributions of SBS1, SBS5, SBS19 and SBS32 on a per-patient/per-clade basis. Single base substitution mutational signature 5 (SBS5), thought to represent a time-dependent mutational process active in all tissues, was the predominant mutational process in colonies . b. The proportion of C>T transitions at CpG dinucleotides across WT, JAK2-mutated and colonies with other driver mutations. *P < 0.05, **P < 0.01 (** also significant after multiple hypothesis testing; Chi-square test). c. The relationship between ‘sharedness’ (see Methods) and telomere length across all phylogenetic trees shows that telomeres shorten in line with increased phylogenetic ‘sharedness’ in keeping with the increased cell divisions during clonal expansion. d. The heritability of telomere length, that is, whether closely related colonies had more similar telomere lengths compared to more distantly related colonies, is assessed using Pagel’s Lambda and Blomberg’s K, with both values in the vicinity of 1 or above, suggesting that telomere length variation across colonies in a phylogenetic tree follows the expected covariance based on phylogenetic relationship. Power for PD5147 is limited because there is little difference in ‘sharedness’ in the mutant colonies. e. The modelled reduction in telomere length per additional stem cell division in JAK2 mutant clades is shown per patient, with a cohort wide estimate of −57.4bp (−74.2, −40.59 95% CI). See Supplementary Note 7 for further interpretation.
a. Time-based phylogenetic trees. Different coloured branches identify separate clades alongside light blue wild type colonies. The vertical axis represents age post conception with treatment received alongside. Driver mutations are depicted in the middle of the branches but may have occurred at any point between the start and end of the branches. Given the uncertainties in the exact ages at the starts and ends of the branches due to modelling branch lengths from mutation count data ( Methods), the credibility intervals for the ends of the branches harbouring driver mutations are shown as black lines and also in b-c. b. Each horizontal grey box represents an individual patient from birth until the last colony sampling timepoint. The time before birth is represented on an expanded scale and is shaded pink. Within each grey box is shown the range of ages during which driver mutation and copy number aberrations are estimated to have occurred. The start and ends of each coloured box represent the median lower and upper bounds of time estimates corresponding to the start and end of the shared branches harbouring driver mutations. Thus, the upper bounds (right edge of the coloured boxes) represent the latest time by which mutation acquisition is estimated to have occurred from phylogenetic analysis. Black lines show the 95% credibility intervals for the start and end of the branches carrying the drivers. Mutation timings are inferred from a model where mutation accumulation within branches follows a Poisson distribution but were not substantially different when using a Negative Binomial model. Diamonds show age at diagnosis. c. Raw data from a-b is shown with 95% CI intervals around the estimated ages of the starts and ends of branches harbouring driver mutations for different patients, together with adjusted SNV counts for branches.
a. The figures shows the smoothed posterior density distribution of the selection coefficient (proportion additional growth per year) vs driver timing for all analysed clades from population simulations and approximate Bayesian computation (ABC). Marginal distributions are also shown. The prior distribution for driver timing is clade dependent and is largely determined by the mutation count at the start and end of the associated branch. Both clonal fractions and lineages-through-time were used as summary statistics in the approximate Bayesian computation for estimates of selection. Main plots show driver mutations acquired after birth, and driver mutations pre-birth are shown within the black box, taking into account driver mutation acquisition during a time when the background stem cell population size is modelled to be growing. b. Data from a. in tabular format. Here, selection coefficients have been converted to clonal expansion (median growth % per year, Selection). The ABC approach gives alternative estimates for ages of driver mutation acquisition as shown. N depicts the number of simulations per clade. Clones with sufficient immediate descendants (>5 coalescences) were included for estimates of selection. c. Comparison of estimates of selection of mutant clades (each labelled by patient ID and driver mutation) from ABC versus Phylofit. The grey lines show 95% credibility intervals for estimates from each approach. Correlation coefficient r = 0.96. Note, that the PD5182 and PD5847 in-utero DNMT3A expansions from panel a. are not shown because, only the ABC approach, and not Phylofit, allowed for modelling selection against a growing background population.
Extended Data Fig. 7 legend: Aberrant cell fractions in bulk blood samples and validation of selection estimates.
a. Plots showing aberrant cell fraction (ACF) in colonies and bulk longitudinal mature blood cell samples. Colony samples were derived from peripheral blood (red dots) or bone marrow (orange dots, in PD5182 and PD5847) mononuclear cells. Bulk mature blood cell samples comprised mostly peripheral blood granulocytes (black dots) and occasionally, bone marrow derived (grey dots) granulocytes (in PD5847, PD6629) or mononuclear cells (in PD5182), and whole blood (brown dots, in PD9478, PD6629). ACF in colonies is the clonal fraction proportion of all colonies. In bulk samples, ACF is calculated as twice the mean VAF of variants that map to the shared ancestral branch of the clone. The x-axis is patient age at sample timepoints. Lines depict the inferred ACF trajectories from the top 0.01% of simulations from approximate Bayesian computation. Black lines, median ACF; grey lines, 95% CI; dotted line, inferred future growth trajectory beyond the sampling time using the growth rate S and accounting for a sigmoid clonal trajectory as clonal dominance is approached. b. 95% confidence intervals for the difference in parent branch and aggregate descendant daughter branch ACFs from phylogenetic tree clades. Confidence intervals are calculated assuming a normal sampling distribution of aggregate mutant read fractions for each branch. Diamonds indicate those recapture samples closest to the colony sampling.* denotes interferon treatment at time of sampling. c. Comparison of estimates of selection coefficients for clades with single driver mutations using Phylofit fitted using the branching pattern within the tree (lineage through time, LTT) and ACF (horizontal axis), versus selection coefficients estimates using just the branching pattern of the tree (LTT) and no ACF (vertical axis) to identify clades that show early rapid branching, but smaller than expected final clonal fractions. 95% credibility intervals for selection coefficients are shown as grey lines and the corresponding median estimates as black dots. Possible early faster expansion are seen in two in utero mutated-DNMT3A clades (PD5182 and PD5847) and the JAK2V617F clade in PD5163 prior to Interferon therapy.
About this article
Cite this article
Williams, N., Lee, J., Mitchell, E. et al. Life histories of myeloproliferative neoplasms inferred from phylogenies. Nature 602, 162–168 (2022). https://doi.org/10.1038/s41586-021-04312-6
Nature Reviews Genetics (2022)
Nature Cardiovascular Research (2022)
Nature Medicine (2022)