Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples

Abstract

Phylogenetic trees are a powerful means to display the evolutionary history of species, pathogens and, more recently, individual cells of the human body. Whole-genome sequencing of laser capture microdissections or expanded stem cells has allowed the discovery of somatic mutations in clones, which can be used as natural barcodes to reconstruct the developmental history of individual cells. Here we describe Sequoia, our pipeline to reconstruct lineage trees from clones of normal cells. Candidate somatic mutations are called against the human reference genome and filtered to exclude germline mutations and artifactual variants. These filtered somatic mutations form the basis for phylogeny reconstruction using a maximum parsimony framework. Lastly, we use a maximum likelihood framework to explicitly map mutations to branches in the phylogenetic tree. The resulting phylogenies can then serve as a basis for many subsequent analyses, including investigating embryonic development, tissue dynamics in health and disease, and mutational signatures. Sequoia can be readily applied to any clonal somatic mutation dataset, including single-cell DNA sequencing datasets, using the commands and scripts provided. Moreover, Sequoia is highly flexible and can be easily customized. Typically, the runtime of the core script ranges from minutes to an hour for datasets with a moderate number (50,000–150,000) of variants. Competent bioinformatic skills, including in-depth knowledge of the R programming language, are required. A high-performance computing cluster (one that is capable of running mutation-calling algorithms and other aspects of the analysis at scale) is also required, especially if handling large datasets.

Key points

  • Sequoia defines the clonal relationships of normal cell populations based on whole-genome identification of somatic mutations from clones obtained through laser capture microdissections or in vitro expansions.

  • Somatic variants are called without a matched normal sample, passed through robust filters and used to build large phylogenetic trees of normal cells. The method has been successfully applied to a wide variety of human tissues, both fetal and adult.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic overview of Sequoia.
Fig. 2: Example of patterns of filtered mutations and final output.
Fig. 3: Sensitivity of variant calling.
Fig. 4: Clonality inferred from VAF distributions.
Fig. 5: Results from different variant callers.
Fig. 6: Common errors in phylogenies and solutions.
Fig. 7: Results of phylogenetic trees from Sequoia.

Similar content being viewed by others

Data availability

Example data to run the pipeline can be found in the GitHub repository (https://github.com/TimCoorens/Sequoia).

Code availability

All code used for Sequoia, along with instructions, example data and expected output files, is available in the GitHub repository (https://github.com/TimCoorens/Sequoia).

References

  1. Coorens, T. H. H. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  2. Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Luquette, L. J. et al. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nat. Genet. 54, 1564–1571 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Bizzotto, S. et al. Landmarks of human embryonic development inscribed in somatic mutations. Science 371, 1249–1253 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  6. Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  10. Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2021).

    Article  CAS  PubMed  Google Scholar 

  11. Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Williams, N. et al. Life histories of myeloproliferative neoplasms inferred from phylogenies. Nature 602, 162–168 (2022).

    Article  ADS  CAS  PubMed  Google Scholar 

  13. Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  16. Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  17. Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  18. Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  19. Robinson, P. S. et al. Inherited MUTYH mutations cause elevated somatic mutation rates and distinctive mutational signatures in normal human cells. Nat. Commun. 13, 1–12 (2022).

    Article  Google Scholar 

  20. Robinson, P. S. et al. Increased somatic mutation burdens in normal human cells due to defective DNA polymerases. Nat. Genet. 53, 1434–1442 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684.e11 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Lee, B. C. H. et al. Mutational landscape of normal epithelial cells in Lynch syndrome patients. Nat. Commun. 13, 2710 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Article  ADS  Google Scholar 

  24. Jones, D. et al. cgpCaVEManWrapper: simple execution of caveman in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinforma. 2016, 15.10.1–15.10.18 (2016).

    Google Scholar 

  25. Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinforma. 52, 15.7.1–15.7.12 (2015).

    Article  Google Scholar 

  26. Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  27. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Benjamin, D. et al. Calling somatic SNVs and indels with Mutect2. Preprint at bioRxiv https://doi.org/10.1101/861054 (2019).

  30. Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).

    Article  CAS  PubMed  Google Scholar 

  31. Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e7 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Coorens, T. H. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  35. Coorens, T. H. H. et al. Lineage-independent tumors in bilateral neuroblastoma. N. Engl. J. Med. https://doi.org/10.1056/nejmoa2000962 (2020).

  36. Custers, L. et al. Somatic mutations and single-cell transcriptomes reveal the root of malignant rhabdoid tumours. Nat. Commun. 12, 1407 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 1–11 (2018).

    Article  Google Scholar 

  38. Yang, Z. & Rannala, B. Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13, 303–314 (2012).

    Article  ADS  CAS  PubMed  Google Scholar 

  39. Swofford, D. L. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.0 beta 10 (Sinauer Associates, 2002).

  40. Tavaré, S. et al. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986).

    MathSciNet  Google Scholar 

  41. Ng, S. W. K. et al. Convergent somatic mutations in metabolism genes in chronic liver disease. Nature 598, 473–478 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  42. Fasching, L. et al. Early developmental asymmetries in cell lineage trees in living individuals. Science 371, 1245–1248 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  43. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Leshchiner, I. et al. Comprehensive analysis of tumour initiation, spatial and temporal progression under multiple lines of treatment. Preprint at bioRxiv https://doi.org/10.1101/508127. (2018).

  45. Oliver, T. R. W. et al. Clonal diversification and histogenesis of malignant germ cell tumours. Nat. Commun. 13, 4272 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  46. Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. Jahn, K., Kuipers, J. & Beerenwinkel, N. Tree inference for single-cell data. Genome Biol. 17, 1–17 (2016).

    Article  Google Scholar 

  48. Ross, E. M. & Markowetz, F. OncoNEM: inferring tumor evolution from single-cell sequencing data. Genome Biol. 17, 69 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Zafar, H., Tzen, A., Navin, N., Chen, K. & Nakhleh, L. SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models. Genome Biol. 18, 178 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Lawless, C., Greaves, L., Reeve, A. K., Turnbull, D. M. & Vincent, A. E. The rise and rise of mitochondrial DNA mutations. Open Biol. 10, 200061 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Volz, E. M., Koelle, K. & Bedford, T. Viral phylodynamics. PLoS Comput. Biol. 9, e1002947 (2013).

    Article  ADS  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  52. Fabre, M. A. et al. The longitudinal dynamics and natural history of clonal haematopoiesis. Nature 606, 335–342 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  53. Jaiswal, S. & Ebert, B. L. Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Sanders, M. A. et al. Life without mismatch repair. Preprint at bioRxiv https://doi.org/10.1101/2021.04.14.437578 (2021).

  55. Malikic, S., Jahn, K., Kuipers, J., Sahinalp, S. C. & Beerenwinkel, N. Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat. Commun. 10, 1–12 (2019).

    Article  CAS  Google Scholar 

  56. Kaufman, D. S., Hanson, E. T., Lewis, R. L., Auerbach, R. & Thomson, J. A. Hematopoietic colony-forming cells derived from human embryonic stem cells. Proc. Natl Acad. Sci. USA 98, 10716–10721 (2001).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  57. Saunders, I. W., Tavaré, S. & Watterson, G. A. On the genealogy of nested subsamples from a haploid population. Adv. Appl. Probab. 16, 471–491 (1984).

  58. Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Y. Badran and M. Dave for critical review of the manuscript. This research was funded by the Wellcome Trust. T.H.H.C. is funded by a European Molecular Biology Organization (EMBO) long-term fellowship (ALTF 172-2022). N.W. is supported by Cancer Research UK and the Alborada Trust. J.N. is funded by a Cancer Research UK Fellowship.

Author information

Authors and Affiliations

Authors

Contributions

T.H.H.C. and M.S.C. wrote the manuscript with contributions from all other authors. T.H.H.C., M.S.C. and N.W. contributed code to the protocol described here. J.N., I.M., M.R.S. and P.J.C. provided guidance and supervision throughout the development of the protocol.

Corresponding authors

Correspondence to Tim H. H. Coorens, Michael Spencer Chapman or Peter J. Campbell.

Ethics declarations

Competing interests

P.J.C., M.R.S. and I.M. are cofounders and stock holders of, and consultants for, Quotient Therapeutics Ltd.

Peer review

Peer review information

Nature Protocols thanks Anna Alemany, Leif Ludwig and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Coorens, T. H. H. et al. Nature 597, 387–392 (2021): https://doi.org/10.1038/s41586-021-03790-y

Spencer Chapman, M. et al. Nature 595, 85–90 (2021): https://doi.org/10.1038/s41586-021-03548-6

Mitchell, E. et al. Nature 606, 343–350 (2022): https://doi.org/10.1038/s41586-022-04786-y

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coorens, T.H.H., Spencer Chapman, M., Williams, N. et al. Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples. Nat Protoc (2024). https://doi.org/10.1038/s41596-024-00962-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41596-024-00962-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing