Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Life histories of myeloproliferative neoplasms inferred from phylogenies


Mutations in cancer-associated genes drive tumour outgrowth, but our knowledge of the timing of driver mutations and subsequent clonal dynamics is limited1,2,3. Here, using whole-genome sequencing of 1,013 clonal haematopoietic colonies from 12 patients with myeloproliferative neoplasms, we identified 580,133 somatic mutations to reconstruct haematopoietic phylogenies and determine clonal histories. Driver mutations were estimated to occur early in life, including the in utero period. JAK2V617F was estimated to have been acquired by 33 weeks of gestation to 10.8 years of age in 5 patients in whom JAK2V617F was the first event. DNMT3A mutations were acquired by 8 weeks of gestation to 7.6 years of age in 4 patients, and a PPM1D mutation was acquired by 5.8 years of age. Additional genomic events occurred before or following JAK2V617F acquisition and as independent clonal expansions. Sequential driver mutation acquisition was separated by decades across life, often outcompeting ancestral clones. The mean latency between JAK2V617F acquisition and diagnosis was 30 years (range 11–54 years). Estimated historical rates of clonal expansion varied substantially (3% to 190% per year), increased with additional driver mutations, and predicted latency to diagnosis. Our study suggests that early driver mutation acquisition and life-long growth and evolution underlie adult myeloproliferative neoplasms, raising opportunities for earlier intervention and a new model for cancer development.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Patient cohort and experimental design.
Fig. 2: Phylogenetic histories of three patients with MPN driven by JAK2V617F.
Fig. 3: Phylogenetic histories of 7 patients with JAK2V617F-mutated MPN and clonal evolution.
Fig. 4: Phylogenetic trees, mutation burdens and telomere lengths.
Fig. 5: Clonal fitness and latency to diagnosis.

Data availability

Whole-genome sequencing data in the form of BAM files across all samples reported in this study have been deposited in the European Genome–Phenome Archive ( with accession codes EGAD00001007714 (whole-genome sequencing colonies) and EGAD00001007715 (targeted-recapture sequencing). Per patient VCF files containing information on somatic mutations identified are available on Mendeley (doi: 10.17632/hrmxybrd2n.1) .

Code availability

Single-nucleotide substitutions (SNV) were called using the cancer variants through expectation maximization (CaVEMan) algorithm, version 1.13.14 ( Small insertions and deletions were called using the Pindel algorithm as implemented in the cgpPindel workflow, version 3.2.0 ( Copy number variants were called using the ASCAT algorithm as implemented in the ascatNgs workflow, version 3.2.0 ( Mutational signatures analysis was performed using MutationalPatterns v1.10, available on Github ( and SigProfiler ( Allele counts at SNV and indel sites were carried out using vafCorrect ( Telomere lengths were estimated using telomerecat, version 3.2 ( Mutations were mapped to phylogenetic branches using Rtreemut developed for this study ( Temporal branch lengths and per driver mutation rates were inferred using rtreefit developed for this study ( Simulation of HSC populations and phylogenies with selection were carried out using rsimpop developed for this study ( Other analyses were carried out using custom R scripts available at


  1. ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Article  ADS  CAS  Google Scholar 

  2. Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Mitchell, T. J. et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TRACERx Renal. Cell 173, 611–623.e17 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Radivoyevitch, T., Hlatky, L., Landaw, J. & Sachs, R. K. Quantitative modeling of chronic myeloid leukemia: insights from radiobiology. Blood 119, 4363–4371 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  9. Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  11. Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  13. Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371, 2477–2487 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  17. Abelson, S. et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature 559, 400–404 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  18. Vainchenker, W. & Kralovics, R. Genetic basis and molecular pathophysiology of classical myeloproliferative neoplasms. Blood 129, 667–679 (2017).

    Article  CAS  PubMed  Google Scholar 

  19. Grinfeld, J. et al. Classification and personalized prognosis in myeloproliferative neoplasms. N. Engl. J. Med. 379, 1416–1430 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  21. Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316.e4 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  23. Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Preprint at (2021).

  24. Nangalia, J. et al. DNMT3A mutations occur early or late in patients with myeloproliferative neoplasms and mutation order influences phenotype. Haematologica 100, 438–442 (2015).

    Article  CAS  Google Scholar 

  25. Ortmann, C. A. et al. Effect of mutation order on myeloproliferative neoplasms. N. Engl. J. Med. 372, 601–612 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Lundberg, P. et al. Clonal evolution and clinical correlates of somatic mutations in myeloproliferative neoplasms. Blood 123, 2220–2228 (2014).

    Article  CAS  PubMed  Google Scholar 

  27. Godfrey, A. L. et al. JAK2V617F homozygosity arises commonly and recurrently in PV and ET, but PV is characterized by expansion of a dominant homozygous subclone. Blood 120, 2704–2707 (2012).

    Article  CAS  PubMed  Google Scholar 

  28. Kahn, J. D. et al. PPM1D-truncating mutations confer resistance to chemotherapy and sensitivity to PPM1D inhibition in hematopoietic cells. Blood 132, 1095–1105 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Vaziri, H. et al. Evidence for a mitotic clock in human hematopoietic stem cells: loss of telomeric DNA with age. Proc. Natl Acad. Sci. USA 91, 9857–9860 (1994).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  30. Anand, S. et al. Effects of the JAK2 mutation on the hematopoietic stem and progenitor compartment in human myeloproliferative neoplasms. Blood 118, 177–181 (2011).

    Article  CAS  PubMed  Google Scholar 

  31. Chapman, M. S. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).

    Article  ADS  CAS  Google Scholar 

  32. de Kanter, J. K. et al. Antiviral treatment causes a unique mutational signature in cancers of transplantation recipients. Cell Stem Cell 28, 1726–1739 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Titmarsh, G. J. et al. How common are myeloproliferative neoplasms? A systematic review and meta-analysis. Am. J. Hematol. 89, 581–587 (2014).

    Article  PubMed  Google Scholar 

  34. Mehta, J., Wang, H., Iqbal, S. U. & Mesa, R. Epidemiology of myeloproliferative neoplasms in the United States. Leuk. Lymphoma 55, 595–600 (2014).

    Article  PubMed  Google Scholar 

  35. Van Egeren, D. et al. Reconstructing the lineage histories and differentiation trajectories of individual cancer cells in myeloproliferative neoplasms. Cell Stem Cell 28, 514–523.e9 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. McKerrell, T. et al. JAK2 V617F hematopoietic clones are present several years prior to MPN diagnosis and follow different expansion kinetics. Blood Adv. 1, 968–971 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hirsch, P. et al. Clonal history of a cord blood donor cell leukemia with prenatal somatic JAK2 V617F mutation. Leukemia 30, 1756–1759 (2016).

    Article  CAS  PubMed  Google Scholar 

  38. Wong, W. H. et al. Engraftment of rare, pathogenic donor hematopoietic mutations in unrelated hematopoietic stem cell transplantation. Sci. Transl. Med. 12, eaax6249 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Olcaydu, D. et al. A common JAK2 haplotype confers susceptibility to myeloproliferative neoplasms. Nat. Genet. 41, 450–454 (2009).

    Article  CAS  PubMed  Google Scholar 

  40. Hinds, D. A. et al. Germ line variants predispose to both JAK2 V617F clonal hematopoiesis and myeloproliferative neoplasms. Blood 128, 1121–1128 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Fleischman, A. G. Inflammation as a driver of clonal evolution in myeloproliferative neoplasm. Mediators Inflamm. 2015, 606819 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Bick, A. G. et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature 586, 763–768 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  43. Bao, E. L. et al. Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells. Nature 586, 769–775 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  44. Watson, C. J. et al. The evolutionary dynamics and fitness landscape of clonal hematopoiesis. Science 367, 1449–1454 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  45. Arber, D. A. et al. The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia. Blood 127, 2391–2405 (2016).

    Article  CAS  PubMed  Google Scholar 

  46. Nielsen, C., Birgens, H. S., Nordestgaard, B. G. & Bojesen, S. E. Diagnostic value of JAK2 V617F somatic mutation for myeloproliferative cancer in 49 488 individuals from the general population. Br. J. Haematol. 160, 70–79 (2013).

    Article  CAS  PubMed  Google Scholar 

  47. Kiladjian, J. J. et al. Pegylated interferon-α-2a induces complete hematologic and molecular responses with low toxicity in polycythemia vera. Blood 112, 3065–3072 (2008).

    Article  CAS  PubMed  Google Scholar 

  48. Pieri, L. et al. JAK2V617F complete molecular remission in polycythemia vera/essential thrombocythemia patients treated with ruxolitinib. Blood 125, 3352–3353 (2015).

    Article  CAS  PubMed  Google Scholar 

  49. Nangalia, J. et al. Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2. N. Engl. J. Med. 369, 2391–2405 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2021).

    Article  CAS  PubMed  Google Scholar 

  51. Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinforma. 56, 15.10.1–15.10.18 (2016).

    Article  Google Scholar 

  52. Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  54. Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 11 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).

    Article  Google Scholar 

  56. Tavaré, S. The linear birthdeath process: An inferential retrospective. Adv. Appl. Probab. 50, 253–269 (2018).

    Article  MathSciNet  MATH  Google Scholar 

Download references


We thank Cambridge Blood and Stem Cell Biobank, funded by the Cambridge Cancer Centre and Wellcome Trust Cambridge Stem Cell Institute, Wellcome Sanger CASM and DNA pipelines for their assistance; and S. Behjati and C. Harrison for valuable discussion. The study was supported by Cancer Research UK (J.N.), EHA Research Award (J.N.), MPN Research Foundation (J.N.) and the Wellcome Trust (P.J.C., A.R.G. and J.L.). Work in the A.R.G. laboratory is supported by the Wellcome Trust, Bloodwise, Cancer Research UK, the Kay Kendall Leukaemia Fund and the Leukaemia and Lymphoma Society of America. J.N. is a CRUK Clinician Scientist fellow. We thank the patients for their participation in the study.

Author information

Authors and Affiliations



J.N., A.R.G. and P.J.C. conceived the study. N.W. performed genomic, phylogenetic and population dynamics analyses with J.N. J.L. assisted with signature, clinical and telomere analyses. E.M. provided genomic data and analyses for normal samples. L.M. assisted with low-input sequencing and mutation signature analysis. A.L.G. assisted with clinical correlation. J.N. and E.J.B. obtained samples. K.J.D. assisted with simulation inferences. A.M. and J.H assisted with computational and laboratory processing pipelines. J.N. directed the study and wrote the manuscript with input from co-authors. All authors reviewed and approved the manuscript.

Corresponding author

Correspondence to Jyoti Nangalia.

Ethics declarations

Competing interests

A patent has been filed by the Wellcome Sanger Institute (inventors N.W. and J.N.; Application number PCT/EP2021/071952) covering somatic mutation identification in the context of tumour contamination of the matched germline sample.

Peer review information

Nature thanks Steven McCarroll, Seishi Ogawa and the other, anonymous reviewers for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 legend. Patient characteristics and somatic mutation fractions in haematopoietic colonies.

a. Patient characteristics. PV, Polycythemia vera; ET, Essential thrombocythaemia; MF, myelofibrosis; HC, Hydroxycarbamide; IFN, Interferon-alpha; FU, follow-up. *PV diagnosed on red cell mass study. b. The distribution of variant allele fractions (VAF) for point mutations pooled across colonies per patient. The mean VAF of individual colonies is shown as red dots. Only autosomal somatic mutations are shown, with those in regions with copy-number aberrations and loss-of-heterozygosity excluded. The plot shows that the colony VAFs are close to 0.5 for the majority.

Extended Data Fig. 2 Legend: Parallel evolution within phylogenetic trees.

a. Phylogenetic tree of PD5117 depicting 3 separate 9pUPD (UPD, uniparental disomy) acquisitions (blue branches), downstream of JAK2V617F (red branch). Below the phylogenetic tree are three B-allele frequency plots showing the regions of 9pUPD in the different clades with vertical red lines showing the boundary of loss of heterozygosity. The event shown on the far right has a distinct breakpoint from the left two events. Blue and green vertical lines show somatic mutations (either prior or subsequent to the UPD event), suggesting that the 9pUPD event depicted in the middle plot occurred first as more mutations have had time to accrue since the copy number aberration. b. Phylogenetic tree of PD5179 depicting two separate 1q+ (orange branches) and 9q- (blue branches) acquisitions. Left plot shows the aggregate VAF of germline single nucleotide polymorphisms (SNP) on Chr1 for samples in the 1q+ major clade versus 1q+ minor clade (left plot). SNPs at a VAF = 2/3 in one clade are at 1/3 in the minor clade, and vice-versa, confirming that different parental chromosomes are amplified in each clade. SNPs in the affected 9q- region also exhibit a clear pattern in VAF (right panel), with VAF = 0.5 for samples in the major 9q- clade but VAF = 0 or 1 for samples in the minor 9q- clade. A proposed model of chr9 copy number changes is shown in the upper right. c. Phylogenetic tree of PD4781 depicting two separate JAK2V617F acquisitions (red branches) each followed by 9pUPD (blue branches). JAK2V617F acquisition occurred on different parental alleles in each instance as SNPs on 9p that have a VAF ~1 for samples in the major JAK2-mutant clade (horizontal bar coloured red) have a VAF ~0 in samples from the minor JAK2-mutant clade (horizontal bar coloured blue) and vice-versa.

Extended Data Fig. 3 Legend: Mutation rates and burden following driver mutation acquisition.

a. Mutation rate estimates for wildtype and different mutant clades within patients. Mutation acquisition is modelled using Poisson modelling taking into account the timing of transition from wildtype to driver mutation acquisition within mutant clades and an excess mutation rate earlier in life ( Methods). Patients and genotypes of clades are shown on the left together with colony number for each clade (N). Wildtype (WT) clades are shown in grey bars, JAK2-mutated clades are shown in red and other mutant clades are shown in yellow. The cohort wide estimate for the mutation rate in WT colonies is shown by the dotted black vertical line at the top. *P < 0.05, **< 0.01 (** also significant after multiple hypothesis testing;  Bonferonni adjusted, two-sided test). Significantly different mutation rates between clades are highlighted only for those significant by both Poisson and Negative Binomial modelling of mutation rates ( Methods). Average mutation burdens are shown to the right for the different timepoints of sampling. b. Non-parametric comparison of mutation burdens in wildtype versus mutant colonies using limma’s rankSumTestWithCorrelation. This accounts for the non-independence of data in mutant colonies but does not account for the timing of driver mutation acquisition. *indicates significance at  P < 0.05 following Bonferonni multiple hypothesis correction.

Extended Data Fig. 4 Legend: Mutational signatures and telomeres.

a. Signature contributions of SBS1, SBS5, SBS19 and SBS32 on a per-patient/per-clade basis. Single base substitution mutational signature 5 (SBS5), thought to represent a time-dependent mutational process active in all tissues, was the predominant mutational process in colonies . b. The proportion of C>T transitions at CpG dinucleotides across WT, JAK2-mutated and colonies with other driver mutations. *< 0.05, **< 0.01 (** also significant after multiple hypothesis testing; Chi-square test). c. The relationship between ‘sharedness’ (see Methods) and telomere length across all phylogenetic trees shows that telomeres shorten in line with increased phylogenetic ‘sharedness’ in keeping with the increased cell divisions during clonal expansion. d. The heritability of telomere length, that is, whether closely related colonies had more similar telomere lengths compared to more distantly related colonies, is assessed using Pagel’s Lambda and Blomberg’s K, with both values in the vicinity of 1 or above, suggesting that telomere length variation across colonies in a phylogenetic tree follows the expected covariance based on phylogenetic relationship. Power for PD5147 is limited because there is little difference in ‘sharedness’ in the mutant colonies. e. The modelled reduction in telomere length per additional stem cell division in JAK2 mutant clades is shown per patient, with a cohort wide estimate of −57.4bp (−74.2, −40.59 95% CI). See Supplementary Note 7 for further interpretation.

Extended Data Fig. 5 legend: Time based trees and timing driver mutation acquisition.

a. Time-based phylogenetic trees. Different coloured branches identify separate clades alongside light blue wild type colonies. The vertical axis represents age post conception with treatment received alongside. Driver mutations are depicted in the middle of the branches but may have occurred at any point between the start and end of the branches. Given the uncertainties in the exact ages at the starts and ends of the branches due to modelling branch lengths from mutation count data ( Methods), the credibility intervals for the ends of the branches harbouring driver mutations are shown as black lines and also in b-c. b. Each horizontal grey box represents an individual patient from birth until the last colony sampling timepoint. The time before birth is represented on an expanded scale and is shaded pink. Within each grey box is shown the range of ages during which driver mutation and copy number aberrations are estimated to have occurred. The start and ends of each coloured box represent the median lower and upper bounds of time estimates corresponding to the start and end of the shared branches harbouring driver mutations. Thus, the upper bounds (right edge of the coloured boxes) represent the latest time by which mutation acquisition is estimated to have occurred from phylogenetic analysis. Black lines show the 95% credibility intervals for the start and end of the branches carrying the drivers. Mutation timings are inferred from a model where mutation accumulation within branches follows a Poisson distribution but were not substantially different when using a Negative Binomial model. Diamonds show age at diagnosis. c. Raw data from a-b is shown with 95% CI intervals around the estimated ages of the starts and ends of branches harbouring driver mutations for different patients, together with adjusted SNV counts for branches.

Extended Data Fig. 6 legend: Estimates of clonal expansion rates in patients.

a. The figures shows the smoothed posterior density distribution of the selection coefficient (proportion additional growth per year) vs driver timing for all analysed clades from population simulations and approximate Bayesian computation (ABC).  Marginal distributions are also shown.  The prior distribution for driver timing is clade dependent and is largely determined by the mutation count at the start and end of the associated branch. Both clonal fractions and lineages-through-time were used as summary statistics in the approximate Bayesian computation for estimates of selection. Main plots show driver mutations acquired after birth, and driver mutations pre-birth are shown within the black box, taking into account driver mutation acquisition during a time when the background stem cell population size is modelled to be growing. b. Data from a. in tabular format. Here, selection coefficients have been converted to clonal expansion (median growth % per year, Selection). The ABC approach gives alternative estimates for ages of driver mutation acquisition as shown. N depicts the number of simulations per clade. Clones with sufficient immediate descendants (>5 coalescences) were included for estimates of selection. c. Comparison of estimates of selection of mutant clades (each labelled by patient ID and driver mutation) from ABC versus Phylofit. The grey lines show 95% credibility intervals for estimates from each approach. Correlation coefficient r = 0.96. Note, that the PD5182 and PD5847 in-utero DNMT3A expansions from panel a. are not shown because, only the ABC approach, and not Phylofit, allowed for modelling selection against a growing background population.

Extended Data Fig. 7 legend: Aberrant cell fractions in bulk blood samples and validation of selection estimates.

a. Plots showing aberrant cell fraction (ACF) in colonies and bulk longitudinal mature blood cell samples. Colony samples were derived from peripheral blood (red dots) or bone marrow (orange dots, in PD5182 and PD5847) mononuclear cells. Bulk mature blood cell samples comprised mostly peripheral blood granulocytes (black dots) and occasionally, bone marrow derived (grey dots) granulocytes (in PD5847, PD6629) or mononuclear cells (in PD5182), and whole blood (brown dots, in PD9478, PD6629). ACF in colonies is the clonal fraction proportion of all colonies. In bulk samples, ACF is calculated as twice the mean VAF of variants that map to the shared ancestral branch of the clone. The x-axis is patient age at sample timepoints. Lines depict the inferred ACF trajectories from the top 0.01% of simulations from approximate Bayesian computation. Black lines, median ACF; grey lines, 95% CI; dotted line, inferred future growth trajectory beyond the sampling time using the growth rate S and accounting for a sigmoid clonal trajectory as clonal dominance is approached. b. 95% confidence intervals for the difference in parent branch and aggregate descendant daughter branch ACFs from phylogenetic tree clades. Confidence intervals are calculated assuming a normal sampling distribution of aggregate mutant read fractions for each branch. Diamonds indicate those recapture samples closest to the colony sampling.* denotes interferon treatment at time of sampling. c. Comparison of estimates of selection coefficients for clades with single driver mutations using Phylofit fitted using the branching pattern within the tree (lineage through time, LTT) and ACF (horizontal axis), versus selection coefficients estimates using just the branching pattern of the tree (LTT) and no ACF (vertical axis) to identify clades that show early rapid branching, but smaller than expected final clonal fractions. 95% credibility intervals for selection coefficients are shown as grey lines and the corresponding median estimates as black dots. Possible early faster expansion are seen in two in utero mutated-DNMT3A clades (PD5182 and PD5847) and the JAK2V617F clade in PD5163 prior to Interferon therapy.

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1–8, including 12 figures, Table 1 and references.

Reporting Summary

Peer Review File

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Williams, N., Lee, J., Mitchell, E. et al. Life histories of myeloproliferative neoplasms inferred from phylogenies. Nature 602, 162–168 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer