Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Population dynamics of normal human blood inferred from somatic mutations


Haematopoietic stem cells drive blood production, but their population size and lifetime dynamics have not been quantified directly in humans. Here we identified 129,582 spontaneous, genome-wide somatic mutations in 140 single-cell-derived haematopoietic stem and progenitor colonies from a healthy 59-year-old man and applied population-genetics approaches to reconstruct clonal dynamics. Cell divisions from early embryogenesis were evident in the phylogenetic tree; all blood cells were derived from a common ancestor that preceded gastrulation. The size of the stem cell population grew steadily in early life, reaching a stable plateau by adolescence. We estimate the numbers of haematopoietic stem cells that are actively making white blood cells at any one time to be in the range of 50,000–200,000. We observed adult haematopoietic stem cell clones that generate multilineage outputs, including granulocytes and B lymphocytes. Harnessing naturally occurring mutations to report the clonal architecture of an organ enables the high-resolution reconstruction of somatic cell dynamics in humans.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Experimental design.
Fig. 2: The phylogeny of cells, showing the relationship between cell types and embryological cell divisions.
Fig. 3: Population size trajectory of the stem cell pool.
Fig. 4: Recapture of mutations by targeted sequencing.
Fig. 5: Approximate Bayesian computation of the number of stem cells and their replication rate.
Fig. 6: Targeted sequencing of granulocyte and lymphocyte samples.

Data availability

Whole-genome and targeted sequencing data have been deposited in the European Genome-Phenome Archive (EGA; Whole-genome sequencing data have been deposited with EGA accession number EGAD00001004086 and targeted sequencing data with accession number EGAD00001004087. Substitution calls have been deposited on Mendeley Data (‘Population dynamics of human blood inferred from spontaneous somatic mutations’: Simulated datasets (from the approximate Bayesian computation) are available from the corresponding authors upon reasonable request.


  1. Till, J. E. & McCulloch, E. A. A direct measurement of the radiation sensitivity of normal mouse bone marrow cells. Radiat. Res. 14, 213–222 (1961).

    Article  ADS  CAS  PubMed  Google Scholar 

  2. Becker, A. J., McCulloch, E. A. & Till, J. E. Cytological demonstration of the clonal nature of spleen colonies derived from transplanted mouse marrow cells. Nature 197, 452–454 (1963).

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Lemischka, I. R., Raulet, D. H. & Mulligan, R. C. Developmental potential and dynamic behavior of hematopoietic stem cells. Cell 45, 917–927 (1986).

    Article  CAS  PubMed  Google Scholar 

  4. Kim, S. et al. Dynamics of HSPC repopulation in nonhuman primates revealed by a decade-long clonal-tracking study. Cell Stem Cell 14, 473-485 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Naik, S. H. et al. Diverse and heritable lineage imprinting of early haematopoietic progenitors. Nature 496, 229–232 (2013).

    Article  ADS  CAS  PubMed  Google Scholar 

  6. Koelle, S. J. et al. Quantitative stability of hematopoietic stem and progenitor cell clonal output in rhesus macaques receiving transplants. Blood 129, 1448–1457 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Abkowitz, J. L., Catlin, S. N. & Guttorp, P. Evidence that hematopoiesis may be a stochastic process in vivo. Nat. Med. 2, 190–197 (1996).

    Article  CAS  PubMed  Google Scholar 

  8. Sun, J. et al. Clonal dynamics of native haematopoiesis. Nature 514, 322–327 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. Busch, K. et al. Fundamental properties of unperturbed haematopoiesis from stem cells in vivo. Nature 518, 542–546 (2015).

    Article  ADS  CAS  PubMed  Google Scholar 

  10. Sawai, C. M. et al. Hematopoietic stem cells are the major source of multilineage hematopoiesis in adult animals. Immunity 45, 597–609 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Bystrykh, L. V., Verovskaya, E., Zwart, E., Broekhuis, M. & de Haan, G. Counting stem cells: methodological constraints. Nat. Methods 9, 567–574 (2012).

    Article  CAS  PubMed  Google Scholar 

  12. Fialkow, P. J., Gartler, S. M. & Yoshida, A. Clonal origin of chronic myelocytic leukemia in man. Proc. Natl Acad. Sci. USA 58, 1468–1471 (1967).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  13. Fialkow, P. J., Jacobson, R. J. & Papayannopoulou, T. Chronic myelocytic leukemia: clonal origin in a stem cell common to the granulocyte, erythrocyte, platelet and monocyte/macrophage. Am. J. Med. 63, 125–130 (1977).

    Article  CAS  PubMed  Google Scholar 

  14. Cartier, N. et al. Hematopoietic stem cell gene therapy with a lentiviral vector in X-linked adrenoleukodystrophy. Science 326, 818–823 (2009).

    Article  ADS  CAS  PubMed  Google Scholar 

  15. Biasco, L. et al. In vivo tracking of human hematopoiesis reveals patterns of clonal dynamics during early and steady-state reconstitution phases. Cell Stem Cell 19, 107–119 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Notta, F. et al. Distinct routes of lineage development reshape the human blood hierarchy across ontogeny. Science 351, aab2116 (2016).

    Article  ADS  CAS  PubMed  Google Scholar 

  17. Werner, B. et al. Reconstructing the in vivo dynamics of hematopoietic stem cells from telomere length distributions. eLife 4, e08687 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Catlin, S. N., Busque, L., Gale, R. E., Guttorp, P. & Abkowitz, J. L. The replication rate of human hematopoietic stem cells in vivo. Blood 117, 4460–4466 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  20. Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264–278 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  24. Notta, F. et al. Isolation of single human hematopoietic stem cells capable of long-term multilineage engraftment. Science 333, 218–221 (2011).

    Article  ADS  CAS  PubMed  Google Scholar 

  25. Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130, 742–752 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Xie, M. et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 20, 1472–1478 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. McKerrell, T. et al. Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis. Cell Rep. 10, 1239–1245 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Plusa, B. et al. The first cleavage of the mouse zygote predicts the blastocyst axis. Nature 434, 391–395 (2005).

    Article  ADS  CAS  PubMed  Google Scholar 

  31. Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Pei, W. et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  33. Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).

    Article  CAS  PubMed  Google Scholar 

  34. Lan, S., Palacios, J. A., Karcher, M., Minin, V. N. & Shahbaba, B. An efficient Bayesian inference framework for coalescent-based nonparametric phylodynamics. Bioinformatics 31, 3282–3289 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Young, A. L., Challen, G. A., Birmann, B. M. & Druley, T. E. Clonal haematopoiesis harbouring AML-associated mutations is ubiquitous in healthy adults. Nat. Commun. 7, 12484 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  37. Schlenner, S. M. et al. Fate mapping reveals separate origins of T cells and myeloid lineages in the thymus. Immunity 32, 426–436 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references


This work was supported by the Leukemia Lymphoma Society, the WBH Foundation and the Wellcome Trust. P.J.C. is a Wellcome Trust Senior Clinical Fellow (WT088340MA). H.L.-S. is a recipient of a Wellcome Trust PhD studentship. N.F.Ø. is the recipient of a Danish Lundbeck Fellowship (2016-17) and M.S.S. is the recipient of a BBSRC CASE Industrial PhD Studentship. Work in the D.G.K. laboratory is supported by a Bloodwise Bennett Fellowship (15008), a European Research Council Starting Grant (ERC-2016-STG–715371) and a European Hematology Association Non-Clinical Advanced Research Fellowship. Work in the A.R.G. laboratory is supported by the Wellcome Trust, Bloodwise, Cancer Research UK, the Kay Kendall Leukaemia Fund, and the Leukemia and Lymphoma Society of America. Work in E.L. laboratory is supported by a Wellcome Trust Sir Henry Dale Fellowship, BBSRC and a European Haematology Association Non-Clinical Advanced Research Fellowship. The D.G.K., E.L. and A.R.G. laboratories are supported by a core support grant from the Wellcome Trust and Medical Research Council to the Cambridge Stem Cell Institute. We acknowledge further assistance from the National Institute for Health Research Cambridge Biomedical Research Centre and the Cambridge Experimental Cancer Medicine Centre. We thank R. Grenfell and M. Strzelecki in the Flow Cytometry Facility of the Cancer Research UK Cambridge Institute for technical assistance and suggestions; C. Eaves and P. Beer for discussions about the experimental design; P. Goodell for discussions about the results; J. Grinfeld and C. Gonzalez-Arias for drawing peripheral blood; and G. Collord for providing negative control samples for targeted sequencing.

Reviewer information

Nature thanks H.-R. Rodewald, L. Shlush and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations



P.J.C., D.G.K., A.R.G. and H.L.-S. designed the experiments. B.J.P.H. performed the bone marrow aspiration experiments. D.G.K., N.F.Ø., M.S.S. and M.B. cultured HSPCs, with advice and reagents from E.L., and isolated cells from subsequent blood draws. E.A. and L.O. oversaw DNA extraction and library preparation. H.L.-S. performed most of the data curation and statistical analysis. S.G. helped to call mutations and construct the phylogeny. I.M. tested for positive selection. M.R.S. oversaw the analysis of mutational signatures. R.J.O. and P.J.C. separated signal from noise in targeted sequencing analysis. K.D. helped to design the approximate Bayesian computation along with P.J.C. and H.L.-S. and co-wrote the supplementary results on inference of population size. H.L.-S., P.J.C., D.G.K. and A.R.G. wrote the manuscript, with contributions from all authors.

Corresponding authors

Correspondence to Anthony R. Green, David G. Kent or Peter J. Campbell.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Cell sorting strategy.

a, Sorting of stem and progenitor cells. Human bone marrow (BM) and peripheral blood (PB) mononuclear cells (time point 1) were stained with anti-CD34, anti-CD38, anti-CD45RA, anti-CD90, anti-CD10 and anti-CD135 antibodies. After exclusion of debris and doublets, gatings on CD34, CD38 and CD90 were used to separate CD34+CD38CD90+CD45RA HSCs. The CD34+CD38+ compartment was gated for CD10 cells before gating on CD135 (also known as FLT3) and CD45RA to separate progenitor compartments: CD135+CD45RA common myeloid progenitor (CMPs), CD135+CD45RA+ granulocyte–macrophage progenitors (GMPs) and CD135CD45RA megakaryocyte–erythrocyte progenitors (MEPs). b, Sorting of B and T lymphocytes. Peripheral blood mononuclear cells (time point, after 4 months) were stained with anti-CD4, anti-CD8 and anti-CD19 antibodies. After exclusion of debris and doublets, the CD4+CD8+CD19 gate was used to isolate T cells, while the CD4CD8CD19+ gate was used to isolate B cells. n = 20,000 events.

Extended Data Fig. 2 Quality control of colonies from single-cell derived clones.

Example histograms of the variant allele fraction (VAF—the proportion of sequencing reads that report the mutation) of mutations in single colonies. a, The VAF of all mutations on autosomes in a typical clonal colony. Because there are two copies of each autosome, and each mutation occurs on only one of them, in a clonal sample the VAF of autosomal mutations is binomially distributed with a mean of 0.5. b, The VAF of all mutations on the X chromosome in the same clonal colony. Because the subject is male, there is only one copy of the X chromosome, and so true mutations here must have a VAF of 1. Occasionally, lower VAFs are seen when a mutation is not detected on a read, when a read from another locus is aberrantly mapped to the locus in question and thus lowering the apparent coverage, or when a mutation is acquired in vitro. c, d, The VAF of autosomal and X chromosome mutations, respectively, in a typical colony seeded by more than one cell. As not all the reads come from the same cell, and most mutations are private to a given cell, a lower proportion of DNA molecules carry the mutation in a polyclonal colony than in a clonal colony, resulting in a leftward shift of the peak of the VAF histogram. These histograms suggest that the number of mutations acquired by the colonies after a few weeks of in vitro expansion is a small fraction of those acquired in vivo over 60 years of life.

Extended Data Fig. 3 Mutation burden of colonies.

a, A histogram of substitution (left) and indel (right) burden per colony. b, The location around the genome of substitutions from all clones combined is shown as a circos plot. The outermost ring of the circos plot depicts the karyotypic ideogram. Moving inwards, base substitutions are shown as rainfall plots in which the height of the dot in the substitution ring is proportional to the log10 of the distance to the next mutation and with the colour of the dot illustrating the base change, as shown in the key. c, A comparison of the substitution burden between stem cells and progenitor cells. There were not significantly more mutations in progenitors than stem cells (P = 0.14, Wilcoxon rank-sum test).

Extended Data Fig. 4 Trinucleotide context of mutations in normal blood colonies.

a, The trinucleotide context of substitutions for all colonies combined. Substitutions can be classed according to the base change (referred to by the pyrimidine of the mutated base pair), and the bases 5′ and 3′ of the mutated one, into 96 categories. The counts in each of these categories are shown. b, Comparison with pooled acute myeloid leukaemia genomes, excluding genomes with >1,500 mutations, and publicly available data on normal tissues that have been whole-genome sequenced so far. The ordering of bars is the same as in a, and the same figure as in a is provided again at the same resolution of the other data for ease of comparison. Please note that these samples have been sequenced on different platforms using different systems, which is likely to result in small differences. Normal liver, normal colon and normal small intestine data were obtained from whole-genome sequencing of single-cell-derived organoids19, whereas normal neurons were derived from single cells that had undergone whole-genome amplification37. c, Example trinucleotide substitution plots for a selection of individual colonies derived from either stem cells (which have the prefix BMH) or progenitor cells (which have the prefix BMP). The ordering of bars is the same as in a.

Extended Data Fig. 5 Construction of the phylogeny using different methods.

a, The phylogeny of cells as presented in Figs. 2, 4, 6, but with the addition of P values next to every node, derived by bootstrapping the substitution matrix 1,000 times, building a tree using SCITE for each replicate, and counting the proportion of the bootstrapped trees that support each node. bf, Phylogenies constructed using different datasets and methods. In each case the phylogeny was constructed using 100 bootstraps of the data, and the P value for each node shown underneath it. Branches are coloured by whether a branch ancestral to exactly the same descendants is also present in the SCITE tree, and are drawn with a thicker line if the branch is recovered in ≥70% of bootstrap replicates. b, Substitution and indel datasets combined, building the tree by maximum parsimony. c, Substitution, indel and neighbour-joining datasets combined, building the tree by neighbour joining. d, Substitutions, tree build by maximum parsimony. e, Indels, tree built by maximum parsimony. f, Short tandem repeats, tree built by neighbour joining.

Extended Data Fig. 6 Relationship between cell types in the phylogeny.

a, The phylogeny showing different stem and progenitor cell types. b, The phylogeny is shown as in a, but with the labels underneath coloured according to which cell types are being compared. The first row of labels has stem cells from bone marrow in red, progenitor cells from bone marrow in grey and stem cells from peripheral blood in black. The second row of labels has stem cells in red and bone marrow progenitors in black. The third row of labels has MEPs in red, CMPs in black, GMPs in blue and stem cells in grey. ce, Analysis of molecular variance is used to test for clustering on the phylogeny for stem cells derived from peripheral blood versus bone marrow cells (c), stem cells versus progenitors (d) and different progenitor types (e). In each panel, a histogram of the null distribution of the statistic used to detect clustering is shown. Distributions were obtained by randomly permuting which cells were assigned to which category. Comparisons are only between cell types not shown in grey in b. The observed value of the statistic is shown as a red vertical line.

Extended Data Fig. 7 Approximate Bayesian computations.

a, The joint prior distribution for stem cell numbers (HSCs) and the generation time for the first approximate Bayesian computation (ABC). b, The location in sample space of the 10% of simulations that produced summary statistics (using only the ltt summary statistics; Supplementary Methods and Supplementary Information) most similar to the observed summary statistics. c, The joint prior distribution for the second ABC, in the area of sample space indicated to be plausible by the first set of simulations. d, The joint posterior distribution of the best 500 simulations from the second ABC, as shown in Fig. 5 for ease of reference. ‘n’, ‘o’ and ‘p’ on the plot indicate the position in sample space from which panels np were drawn. ei, Cross-validation of the model to choose the number of accepted simulations and the weighting applied to the ltt summary statistics (Supplementary Methods and Supplementary Information). j, For illustrative purposes, five simulations were sampled for each of three population sizes along the plausible diagonal of sample space indicated in b. One set of summary statistics are shown for these simulations in k. k, The red line indicates a simulation coming from the area of sample space indicated by a red point in j; and similarly for blue and green lines. The black dotted line indicates the observed values for these summary statistics. These summary statistics provide a count—for the different numbers of samples (x axis)—of how many of the 3,952 mutations that we considered (y axis) are in this many samples with two or more reads, using error model 1 (which simulates errors according to the error rate in control DNA (Supplementary Methods)). The same summary statistics were calculated for different mutant read number cut-offs. l, For each of the 1,000 simulations that produce summary statistics that were the most similar to the observed data, the Euclidean distance from the observed data (y axis) is plotted against the number of stem cells in that simulation (x axis). This information is used by the neural network regression step to define the most likely value for the number of stem cells. The most similar values are seen at around 100,000 stem cells, which was the location of the median of the posterior distribution from neural network regression. m, The observed phylogeny, with branch points indicated by asterisks. np, Phylogenies drawn from simulations that occur at the points in sample space indicated in d. n, A relatively plausible simulation, since the pattern of branch points is not dissimilar from the pattern of the observed phylogeny (m). Simulations with smaller stem cell populations and faster stem cell turnover rates resulted in phylogenies in which the stem cells were very closely related to each other (o), whereas those with larger populations and slower turnover result in phylogenies in which the stem cells only share an embryonic common ancestor, and no branches are seen through the tree (p).

Extended Data Fig. 8 Targeted sequencing data.

a, Correlations between the VAFs of all sequenced samples, shown on a log scale. Note that samples that were sequenced to a lower depth cannot have VAFs as small as samples sequenced to higher depths. b, Targeted sequencing information with no error correction. The data are shown as in Fig. 4 for all analysed samples, but focusing on only the first 350 mutations of molecular time. To allow a better comparison between samples that were sequenced at different depths, a higher detection threshold and different detection threshold are used relative to Fig. 4. c, Targeted sequencing information after using cord blood controls for sequencing error correction with the Bayesian generalized Poisson mixed-effects model. The colour scale is the same as in b. The data for the granulocytes at the nine-month time point are the same as in Fig. 4 (provided again for ease of comparison), but plotted with a different colour scale.

Extended Data Fig. 9 Multilineage clonal output.

a, The phylogeny with targeted sequencing information in different blood fractions overlaid as in Fig. 6, shown again here for ease of reference. The colouring of mutations reflects in which peripheral blood cell fractions they could be detected, as indicated by the colour key. Arrows indicate adult clones with multilineage output, with letters corresponding to panels bf. B, B lymphocytes; G, granulocytes; G low VAF, granulocytes, allele fraction too low to be detected in lymphocytes; T, T lymphocytes. bf, VAFs of all mutations on branches (indicated by arrows in a) with mutations beyond molecular time 100 that are detectable in granulocytes and B lymphocytes but not in T lymphocytes.

Supplementary information

Supplementary Methods

This file contains Supplementary Methods: Detailed methods for data produced in the manuscript.

Reporting Summary

Supplementary Appendix

This file contains Technical appendix: Summary of mathematical basis for inferring numbers of haematopoietic stem cells from capture-recapture data.

Supplementary Table 1

This file contains Supplementary Table 1: Sequencing metrics for single cell-derived haematopoietic colonies.

Supplementary Table 2

This file contains Supplementary Table 2: Posterior estimates of true VAFs for mutations in recapture experiment.

Supplementary Table 3

This file contains Supplementary Table 3: Clonal haematopoiesis hotspots included in bait set.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee-Six, H., Øbro, N.F., Shepherd, M.S. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Stem Cells
  • Haematopoietic Stem Cell
  • Allele Fraction
  • Molecular Time
  • Clonal Contributions

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing