Abstract
Phylogenetic trees are a powerful means to display the evolutionary history of species, pathogens and, more recently, individual cells of the human body. Whole-genome sequencing of laser capture microdissections or expanded stem cells has allowed the discovery of somatic mutations in clones, which can be used as natural barcodes to reconstruct the developmental history of individual cells. Here we describe Sequoia, our pipeline to reconstruct lineage trees from clones of normal cells. Candidate somatic mutations are called against the human reference genome and filtered to exclude germline mutations and artifactual variants. These filtered somatic mutations form the basis for phylogeny reconstruction using a maximum parsimony framework. Lastly, we use a maximum likelihood framework to explicitly map mutations to branches in the phylogenetic tree. The resulting phylogenies can then serve as a basis for many subsequent analyses, including investigating embryonic development, tissue dynamics in health and disease, and mutational signatures. Sequoia can be readily applied to any clonal somatic mutation dataset, including single-cell DNA sequencing datasets, using the commands and scripts provided. Moreover, Sequoia is highly flexible and can be easily customized. Typically, the runtime of the core script ranges from minutes to an hour for datasets with a moderate number (50,000–150,000) of variants. Competent bioinformatic skills, including in-depth knowledge of the R programming language, are required. A high-performance computing cluster (one that is capable of running mutation-calling algorithms and other aspects of the analysis at scale) is also required, especially if handling large datasets.
Key points
-
Sequoia defines the clonal relationships of normal cell populations based on whole-genome identification of somatic mutations from clones obtained through laser capture microdissections or in vitro expansions.
-
Somatic variants are called without a matched normal sample, passed through robust filters and used to build large phylogenetic trees of normal cells. The method has been successfully applied to a wide variety of human tissues, both fetal and adult.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Example data to run the pipeline can be found in the GitHub repository (https://github.com/TimCoorens/Sequoia).
Code availability
All code used for Sequoia, along with instructions, example data and expected output files, is available in the GitHub repository (https://github.com/TimCoorens/Sequoia).
References
Coorens, T. H. H. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021).
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021).
Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).
Luquette, L. J. et al. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nat. Genet. 54, 1564–1571 (2022).
Bizzotto, S. et al. Landmarks of human embryonic development inscribed in somatic mutations. Science 371, 1249–1253 (2021).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016).
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014).
Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).
Ellis, P. et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2021).
Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022).
Williams, N. et al. Life histories of myeloproliferative neoplasms inferred from phylogenies. Nature 602, 162–168 (2022).
Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).
Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397 (2021).
Robinson, P. S. et al. Inherited MUTYH mutations cause elevated somatic mutation rates and distinctive mutational signatures in normal human cells. Nat. Commun. 13, 1–12 (2022).
Robinson, P. S. et al. Increased somatic mutation burdens in normal human cells due to defective DNA polymerases. Nat. Genet. 53, 1434–1442 (2021).
Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684.e11 (2020).
Lee, B. C. H. et al. Mutational landscape of normal epithelial cells in Lynch syndrome patients. Nat. Commun. 13, 2710 (2022).
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Jones, D. et al. cgpCaVEManWrapper: simple execution of caveman in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinforma. 2016, 15.10.1–15.10.18 (2016).
Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinforma. 52, 15.7.1–15.7.12 (2015).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Benjamin, D. et al. Calling somatic SNVs and indels with Mutect2. Preprint at bioRxiv https://doi.org/10.1101/861054 (2019).
Saunders, C. T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).
Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).
Ellrott, K. et al. Scalable open science approach for mutation calling of tumor exomes using multiple genomic pipelines. Cell Syst. 6, 271–281.e7 (2018).
Coorens, T. H. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).
Coorens, T. H. H. et al. Lineage-independent tumors in bilateral neuroblastoma. N. Engl. J. Med. https://doi.org/10.1056/nejmoa2000962 (2020).
Custers, L. et al. Somatic mutations and single-cell transcriptomes reveal the root of malignant rhabdoid tumours. Nat. Commun. 12, 1407 (2021).
Hoang, D. T. et al. MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation. BMC Evol. Biol. 18, 1–11 (2018).
Yang, Z. & Rannala, B. Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13, 303–314 (2012).
Swofford, D. L. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.0 beta 10 (Sinauer Associates, 2002).
Tavaré, S. et al. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986).
Ng, S. W. K. et al. Convergent somatic mutations in metabolism genes in chronic liver disease. Nature 598, 473–478 (2021).
Fasching, L. et al. Early developmental asymmetries in cell lineage trees in living individuals. Science 371, 1245–1248 (2021).
Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat. Methods 11, 396–398 (2014).
Leshchiner, I. et al. Comprehensive analysis of tumour initiation, spatial and temporal progression under multiple lines of treatment. Preprint at bioRxiv https://doi.org/10.1101/508127. (2018).
Oliver, T. R. W. et al. Clonal diversification and histogenesis of malignant germ cell tumours. Nat. Commun. 13, 4272 (2022).
Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).
Jahn, K., Kuipers, J. & Beerenwinkel, N. Tree inference for single-cell data. Genome Biol. 17, 1–17 (2016).
Ross, E. M. & Markowetz, F. OncoNEM: inferring tumor evolution from single-cell sequencing data. Genome Biol. 17, 69 (2016).
Zafar, H., Tzen, A., Navin, N., Chen, K. & Nakhleh, L. SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models. Genome Biol. 18, 178 (2017).
Lawless, C., Greaves, L., Reeve, A. K., Turnbull, D. M. & Vincent, A. E. The rise and rise of mitochondrial DNA mutations. Open Biol. 10, 200061 (2020).
Volz, E. M., Koelle, K. & Bedford, T. Viral phylodynamics. PLoS Comput. Biol. 9, e1002947 (2013).
Fabre, M. A. et al. The longitudinal dynamics and natural history of clonal haematopoiesis. Nature 606, 335–342 (2022).
Jaiswal, S. & Ebert, B. L. Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019).
Sanders, M. A. et al. Life without mismatch repair. Preprint at bioRxiv https://doi.org/10.1101/2021.04.14.437578 (2021).
Malikic, S., Jahn, K., Kuipers, J., Sahinalp, S. C. & Beerenwinkel, N. Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data. Nat. Commun. 10, 1–12 (2019).
Kaufman, D. S., Hanson, E. T., Lewis, R. L., Auerbach, R. & Thomson, J. A. Hematopoietic colony-forming cells derived from human embryonic stem cells. Proc. Natl Acad. Sci. USA 98, 10716–10721 (2001).
Saunders, I. W., Tavaré, S. & Watterson, G. A. On the genealogy of nested subsamples from a haploid population. Adv. Appl. Probab. 16, 471–491 (1984).
Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).
Acknowledgements
We thank Y. Badran and M. Dave for critical review of the manuscript. This research was funded by the Wellcome Trust. T.H.H.C. is funded by a European Molecular Biology Organization (EMBO) long-term fellowship (ALTF 172-2022). N.W. is supported by Cancer Research UK and the Alborada Trust. J.N. is funded by a Cancer Research UK Fellowship.
Author information
Authors and Affiliations
Contributions
T.H.H.C. and M.S.C. wrote the manuscript with contributions from all other authors. T.H.H.C., M.S.C. and N.W. contributed code to the protocol described here. J.N., I.M., M.R.S. and P.J.C. provided guidance and supervision throughout the development of the protocol.
Corresponding authors
Ethics declarations
Competing interests
P.J.C., M.R.S. and I.M. are cofounders and stock holders of, and consultants for, Quotient Therapeutics Ltd.
Peer review
Peer review information
Nature Protocols thanks Anna Alemany, Leif Ludwig and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Coorens, T. H. H. et al. Nature 597, 387–392 (2021): https://doi.org/10.1038/s41586-021-03790-y
Spencer Chapman, M. et al. Nature 595, 85–90 (2021): https://doi.org/10.1038/s41586-021-03548-6
Mitchell, E. et al. Nature 606, 343–350 (2022): https://doi.org/10.1038/s41586-022-04786-y
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Coorens, T.H.H., Spencer Chapman, M., Williams, N. et al. Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples. Nat Protoc (2024). https://doi.org/10.1038/s41596-024-00962-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41596-024-00962-8
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.